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Abstract 

Motivated by problems arising in decentralized control problems and non-cooperative Nash games, we 
consider a class of strongly monotone Cartesian variational inequality (VI) problems, where the mappings 
either contain expectations or their evaluations are corrupted by error. Such complications are captured 
under the umbrella of Cartesian stochastic variational inequality problems and we consider solving such 
problems via stochastic approximation (SA) schemes. Specifically, we propose a scheme wherein the 
steplength sequence is derived by a rule that depends on problem parameters such as monotonicity and 
Lipschitz constants. The proposed scheme is seen to produce sequences that are guaranteed to converge 
almost surely to the unique solution of the problem. To cope with networked multi-agent generalizations, 
we provide requirements under which independently chosen steplength rules still possess desirable almost- 
sure convergence properties. In the second part of this paper, we consider a regime where Lipschitz 
constants on the map are either unavailable or difficult to derive. Here, we present a local randomization 
technique that allows for deriving an approximation of the original mapping, which is then shown to be 
Lipschitz continuous with a prescribed constant. Using this technique, we introduce a locally randomized 
SA algorithm and provide almost sure convergence theory for the resulting sequence of iterates to an 
approximate solution of the original variational inequality problem. Finally, the paper concludes with 
some preliminary numerical results on a stochastic rate allocation problem and a stochastic Nash-Cournot 
game. 



1 Introduction 



Multi-agent system-theoretic problems can collectively capture a range of problems arising from decentralized 
control problems and noncooperative games. In static regimes, where agent problems are convex and agent 
feasibility sets are uncoupled, the associated solutions of such problems are given by the solution of a 
suitably defined Cartesian variational inequality problem. Our interest lies in settings where the mapping 
arising in such problems is strongly monotone and one of the following hold: (i) Either the mapping contains 
expectations whose analytical form is unavailable; or (ii) The evaluation of such a mapping is corrupted 
by error. In either case, the appropriate problem of interest is given by a stochastic variational inequality 
problem VI(X, F) that requires determining an G X such that 



- x*) T F(>*) > for all x G X, 



(1) 



where 



F(x) 



VE[<M*,£)]/ 
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&i : T>i x R d — >> R n % C R n % X is a closed and convex set, is an open set in W rli and J^ i=1 ^ = n. 
Furthermore, £ : Q — >• M d is a random variable, where ^ denotes the associated sample space and E[-] denotes 
the expectation with respect to £. 

Variational inequality problems assume relevance in capturing the solution sets of convex optimization 
and equilibrium problems [11]. Their Cartesian specializations arise from specifying the set X as a Cartesian 
product, i.e., X = Y\ i=1 Xi. Such problems arise in the modeling of multi-agent decision-making problems 
such as rate allocation problems in communication networks [18,31,35], noncooperative Nash games in 
communication networks [1,2,39], competitive interactions in cognitive radio networks [20,29,30,38], and 
strategic behavior in power markets [16,17,32]. Our interest lies in regimes complicated by uncertainty, which 
could arise as a result of agents facing expectation-based objectives that do not have tractable analytical 
forms. Naturally, the Cartesian stochastic variational inequality problem framework represents an expansive 
model for capturing a range of such problems. 

Two broad avenues exist for solving such a class of problems. Of these, the first approach, referred to as 
the sample- average approximation (SAA) method. In adopting this approach, one uses a set of M samples 
{£i>---?£m} an d considers the sample-average problem where an expected mapping E[4>(x, £)] is replaced 
by the sample-average The resulting problem is deterministic and its solution provides 

an estimator for the solution of the true problem. The asymptotic behavior of these estimators has been 
studied extensively in the context of stochastic optimization and variational problems [23,33]. The other 
approach, referred to as stochastic approximation, also has a long tradition. First proposed by Robbins and 
Monro [28] for root-finding problems and by Ermoliev for stochastic programs [8-10], significant effort has 
been applied towards theoretical and algorithmic examination of such schemes (cf. [4,21,34]). Yet, there has 
been markedly little on the application of such techniques to solution of stochastic variational inequalities, 
exceptions being [14,19]. Standard stochastic approximation schemes provide little guidance regarding the 
choice of a steplength sequence, denoted by {7/c}, apart from requiring that the sequence satisfies 

CO CO 

Ik = 00 and J2ll< 00. 

k=0 fc=0 

The behavior of stochastic approximation schemes is closely tied to the choice of steplength sequences. 
Generally, there have been two avenues traversed in choosing steplengths: (i) Deterministic steplength se- 
quences: Spall [34, Ch. 4, pg. 113] considered diverse choices of the form 7^ = ^ k _^_f_^_ a ^ a , where /3 > 0, 
< a < 1, and a > is a stability constant. In related work in the context of approximate dynamic 
programming, Powell [27] examined several deterministic update rules. However, much of these results are 
not provided with convergence theory, (ii) Stochastic steplength sequences: An alternative to a deterministic 
rule is a stochastic scheme that updates steplengths based on observed data. Of note is recent work by 
George et al. [12] where an adaptive stepsize rule is proposed that minimizes the mean squared error. In a 
similar vein, Cicek et al. [7] develop an adaptive Kiefer-Wolfowitz SA algorithm and derive general upper 
bounds on its mean-squared error. 

Before proceeding, we note the relationship of the present work to three specific references. In [19], 
Cartesian stochastic variational inequality problems with Lipschitzian mappings were considered with a focus 
towards integrating Tikhonov and prox-based regularization techniques with standard stochastic gradient 
methods. However, the steplength sequences were "non-adaptive" since the choices did not adapt to problem 
parameters. Two problem-specific adaptive rules were developed in our earlier work on stochastic convex 
programming. Additionally, local smoothing techniques were examined for addressing the lack of smoothness. 
Of these, the first, referred to as the recursive steplength SA scheme, forms the inspiration for a generalization 
pursued in the current work. Finally, in [40], we extended this recursive rule to accommodate stochastic 
variational inequality problems. Note that the qualifier "adaptive" implies that the steplength rule adapts 
to problem parameters such as Lipschitz constant, monotonicity constant and the diameter of the set. In 
this paper, our goal lies in developing a distributed adaptive stochastic approximation scheme (DASA) that 
can accommodate networked mult i- agent implementations and cope with non-Lipschitzian mappings. More 
specifically, the main contributions of this paper are as follows: 

(i) DASA schemes for Lipschitzian CSVIs: We begin with a simple extension of the adaptive ste- 
pength rule presented in [41] to the variational regime under a Lipschitzian requirement on the map. Yet, 
implementing this rule in a centralized regime is challenging and this motivates the need for distributed coun- 
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terparts that can be employed on Cartesian problems. Such a distributed rule is developed and produces 
sequences of iterates that are guaranteed to converge to the solution in almost-sure sense. 

(ii) DASA schemes for non-Lipschitzian CSVIs: Our second goal lies in addressing the absence or 
unavailability of a Lipschitz constant by leveraging locally randomized smoothing techniques, again inspired 
by our efforts to solve nonsmooth stochastic optimization problems [41]. In this part of the paper, we 
generalize this natively centralized scheme for optimization problems to a distributed version that can cope 
with Cartesian stochastic variational inequality problems. 

The remainder of this paper is organized as follows. In Section 2, we provide a canonical formulation for 
the problem of interest and motivate this formulation through two sets of examples. An adaptive steplength 
SA scheme for stochastic variational inequality problems with Lipschitzian mappings and its distributed 
generalization are provided in Section 3. By leveraging a locally randomized smoothing technique, in Sec- 
tion 4, we extend these schemes to a regime where Lipschitzian assumptions do not hold. Finally, the paper 
concludes with some preliminary numerics in Section 5. 

Notation: Throughout this paper, a vector x is assumed to be a column vector. W e wri te x T to 
denote the transpose of a vector x, \\x\\ to denote the Euclidean vector norm, i.e., ||x|| = Vx T x, \\x\\i to 
denote the 1-norm, i.e., ||x||i = J27=i \ x i\ ^ or x ^ anc ^ \\ x \\oo to denote the infinity vector norm, i.e., 
||:r||oo = maxi = i v .. )U \xi\ for x G R n . We use Hx(%) to denote the Euclidean projection of a vector x on a 
set A, i.e., \\x — Hx(x)\\ = min^x \\x — y\\. For a convex function / with domain dornf, a vector g is a 
subgradient of x G dom/ if f(x) + g T (x — x) < f(x) holds for all x G dom/. The set of all subgradients 
of / at x is denoted by df(x). We write a. s. as the abbreviation for "almost surely". We use Prob(A) to 
denote the probability of an event A and E[z] to denote the expectation of a random variable z. The Mat lab 
notation (ui; U2\ u$) refers to a column vector with components ui, U2 and 1/3, respectively. 

2 Formulation and source problems 

In Section 2.1, we formulate the Cartesian stochastic variational inequality (CSVI) problem and outline the 
stochastic approximation algorithmic framework. A motivation for studying CSVIs is provided through two 
examples in Section 2.2, while a review of the main assumptions is given in Section 2.3. 

2.1 Problem formulation and algorithm outline 

Given a set X C R n and a mapping F : X — >> R n , the variational inequality problem, denoted by VI(A, F), 
requires determining a vector x* G X such that (x — x*) T F(x*) > holds for all x G X. When the underlying 
set X is given by a Cartesian product, as articulated by the definition X = Yl^L 1 where C R n % then 
the associated variational inequality is qualified as a Cartesian variational inequality problem. Now suppose 
that x* = (x\\ x\ \ . . . ; x* N ) G X satisfies the following system of inequalities: 



where & : — >• R di is a random vector with some probability distribution for i = 1,...,7V. Naturally, 
problem (3) may be reduced to VI(A, F) by noting that F may be defined as in (2), where n = ^f = i Ui and 
F : X — » R n . Then, VI(A, F) is a stochastic variational inequality problem on the Cartesian product of the 
sets Xi with a solution x* — (x\\ x^ \ • . • ; x* N ). 

Much of the interest in this paper pertains to the development of stochastic approximation schemes for 
VI(X,F) when the components the map F is defined by (2). For such a problem, we consider the following 
distributed stochastic approximation scheme: 



for all k > and i = 1, . . . , N, where Fi(x) = E ^)] for i = 1, . . . , N, jk,i > is the stepsize for the zth 
index at iteration fc, Xk,i denotes the solution for the i-th index at iteration fc, and Xk = (^fc,i5 ^fc,2; Xk,N)- 
Moreover, xq G X is a random initial vector independent of any other random variables in the scheme and 



( Xi -x*) T E[<j>i(x*^i)] > 



for all Xi G Xi and all i = 1, . . . , N, 



(3) 



(4) 
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2.2 Motivating examples 

We consider two problems that can be addressed by Cartesian stochastic variational inequality framework. 

Example 1 (Networked stochastic Nash-Cournot game). A classical example of a Nash game is a networked 
Nash-Cournot game [15,24]. Suppose a collection of N firms compete over a network of M nodes wherein 
the production and sales for firm i at node j are denoted by g^ and Sij, respectively. Suppose firm z's cost of 
production at node j is denoted by the uncertain cost function Cij(gij,£). Furthermore, goods sold by firm 
i at node j fetch a random revenue defined by Pj{sj^) s ij where Pj(sj,£) denotes the uncertain sales price 
at node j and Sj = YliLi s^j denotes the aggregate sales at node j. Finally, firm z's production at node j is 
capacitated by cap^ and its optimization problem is given by the following 1 : 



minimize 
subject to 



E[/i(z,0] 

X{ G 



where x = (x 1 ;...;x N ) with x { = (gusi), gi = (gn; . . . ; g iM ), Si = (sn; . . . ; s iM ), and 



M 



A I 



M 



Xi - { (9i,Si) | ^2,9%j ^2 Sij, 9ij, > 0, gij < cap ij? j = 1,...,M 



□ 



Under the validity of the interchange between the expectation and the derivative operator, the resulting 
equilibrium conditions of this stochastic Nash-Cournot game are compactly captured by the variational 
inequality VI(X, F) where X 4 Y\f=i^i and F(x) = (F^x); . . . ; F N (x)) with F^x) = E[V x Ji(x^)). 

Example 2 (Stochastic composite minimization problem). Consider a generalized min-max optimization 
problem given by 



minimize 



9(tl> 1 (x),...,tl> m (x)) 



N 



subject to xGl = [|li, 



(5) 
(6) 



where . . . , u m ) is defined as 

^Oi, . . . ,u m ) = max <^ y^uf (Aiy + bi) - fi(y) > , 

while i/ji(x) = E[<fii(x, £)], V Xj ^i(x) = E[i7^(x, £)] for i = 1, . . . , m, and /?(?/) is a Lipschitz continuous convex 
function of □ 

Under the assumption that the derivative and the expectation operator can be interchanged, it can 
be seen that the solution to this optimization problem can be obtained by solving a Cartesian stochastic 
variational inequality problem VI (X x y, F) where 



F(x,y) 



fllZi^Mx^Aiy + bi) \ 
YT=i ^x N ^i(x)(Aiy + bi) 



Note that the specification that ipi(x) and its Jacobian are expectation- valued may be a consequence of not 
having access to noise-free evaluations of either object. In particular, one only has access to evaluations 
<f>i(x,£) and Jacobian evaluations given by Hji(x,£) = V Xj (j)i(x^). 



^■Note that the transportation costs are assumed to be zero. 
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2.3 Assumptions 

Our interest lies in the development of distributed stochastic approximation schemes for Cartesian stochastic 
variational inequality problems as espoused by (4) and the associated global convergence theory in regimes 
where the mappings are single-valued mappings that are not necessarily Lipschitz continuous. We let 

N 

x = J[x tl 

and make the following assumptions on the set X and the mapping F. 
Assumption 1. Assume the following: 

(a) The set C R ni is closed and convex for i = 1, . . . , TV. 

(b) The mapping F(x) is a single-valued Lipschitz continuous over the set X with a constant L. 

(c) The mapping F(x) is strongly monotone with a constant n > 0: 

(F(x) - F(y)) T (x -y)>r]\\x- y\\ 2 for all x, y e X. 

Since F is strongly monotone, the existence and uniqueness of the solution to VI (X, F) is guaranteed by 
Theorem 2.3.3 of [11]. We let x* denote the solution of VI(X,F). 

Regarding the method in (4), we let Tk denote the history of the method up to time /c, i.e., Tk = 
{^o, £o, £l, • • • , £fc-i} for k > 1 and To = {x }, where ^ k = (£ M ; f fe>2 ; . . . ; 6c, iv)- In terms of this definition, 
we note that 

E[wfc,i | T k ] = E[$i(xki£k,i) I Fk] ~ Fi(x k ) = for all k > and all i. 
We impose some further conditions on the stochastic errors Wk,i of the algorithm, as follows. 
Assumption 2. The errors Wk = (wfc,i; Wfc,2 5 • • • 5 ^fc,iv) such that for some (deterministic) v > 0, 

E[|K|| 2 | -Ffc] < v 2 a.s. for all k>0. 

We use the following Lemma in establishing the convergence of method (4) and its extensions. This result 
may be found in [26] (cf. Lemma 10, page 49). 

Lemma 1. Let {v k } be a sequence of nonnegative random variables, where E[vq] < oo, and let {a k } and 
{lik} be deterministic scalar sequences such that: 

E[vfc+i |^o, • • • , Vk] < (1 - OL k )vk + /ife a.s. for all k>0, 

< otk < 1, /ife > 0, for all k > 0, 

OO CO 

\^ = oo, \^ ///,; < oo, lim — = 0. 

^-^ k^too (Iv- 

k=0 k=0 K 

Then, Vk —> almost surely, lim^oo E[^] = ; and for any e > and for all k > 0, 

Prob ({vj < e for all j >£;})>! 



3 Distributed adaptive SA schemes for Lipschitzian mappings 

In this section, we restrict our attention to settings where the mapping F(x) is a single-valued Lipschitzian 
map. In Section 3.1, we begin by developing an adaptive steplength rule for deriving steplength sequences 
from problem parameters such as monotonicity constant, Lipschitz constant etc., where the qualifier adaptive 
implies that the steplength choices "adapt" or are "self- tuned" to problem parameters. Unfortunately, in 
distributed regimes, such a rule requires prescription by a central coordinator, a relatively challenging task 
in multi-agent regimes. This motivates the development of a distributed counterpart of the aforementioned 
adaptive rule and provide convergence theory for such a generalization in Section 3.2. 
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3.1 An adaptive steplength SA (ASA) scheme 



Stochastic approximation algorithms require stepsize sequences to be square summable but not summable. 
These algorithms provide little advice regarding the choice of such sequences. One of the most common 
choices has been the harmonic steplength rule which takes the form of 7^ = | where 6 > is a constant. 
Although, this choice guarantees almost-sure convergence, it does not leverage problem parameters. Nu- 
merically, it has been observed that such choices can perform quite poorly in practice. Motivated by this 
shortcoming, we present a steplength scheme for a centralized variant of algorithm (4): 

x k +i = U x (x k - Jk(F(x k ) + w k )) , / 

a ( 7 ) 
w k = $(x k ,£k) - F(xk), 

for k > 0. The proposed scheme derives a rule for updating steplength sequences that adapts to problem 
parameters while guaranteeing almost-sure convergence of Xk to the unique solution of VI (X, F). 

A key challenge in practical implementations of stochastic approximation lies in choosing an appropriate 
diminishing steplength sequence {7/c}. In [41], we developed a rule for selecting such a sequence in a convex 
stochastic optimization regime by leveraging three parameters: (i) Lipschitz constant of the gradients; (ii) 
strong convexity constant; and (ii) diameter of the set X. Along similar directions, such a rule is constructed 
for strongly monotone stochastic variational inequality problems and the results in this subsection bear 
significant similarity to those presented in [41] with some key distinctions. First, these results are presented 
for strongly monotone stochastic variational inequality problems and second, co-coercivity of the mappings 
is not assumed, leading to a tighter requirement on the choice of steplengths. 

Lemma 2. Consider algorithm (7), and let Assumptions 1 and 2 hold. Then, the following relation holds 
almost surely for all k > 0: 

E[||x fc+1 - x*\\ 2 I F k ] < (1 - 2 nik + L^DWx, - x*\\ 2 + 7fc V. (8) 

Proof. By the definition of algorithm (7) and the non-expansiveness property of the projection operator, we 
have for all k > 0, 

||ar fc+ i - x*\\ 2 = \\U x (x k - j k (F(x k ) + w k )) - U x (x* - j k F(x*))\\ 2 
< \\x k -x*- lk (F(x k ) +w k - F(x*))\\ 2 . 

Taking expectations conditioned on the past, and by employing E[wk | F k ] = 0, we have 

E[\\x k+1 - x*\\ 2 | T k ] < \\x k - x *\\ 2 +1 2 \\F{x k ) - F{x*)\\ 2 +1 2 E[\\w k \\ 2 \ F k ] 
- 2 lk {x k - x*) T (F(x k ) - F(x*)) 
<(l-2ri~ fk + 1 2 L 2 )\\x k -x*\\ 2 + 1 2 v 2 , 

where the second inequality is a consequence of the strong monotonicity and Lipschitz continuity of F(x) 
over X as well as the boundedness of E || 2 | F k ] • □ 

The upper bound (8) can be used to construct an adaptive stepsize rule. Note that inequality (8) holds 
for any 7^ > 0. When the stepsize is further restricted so that < 7^ < -p-, we have 

1 - 7 fe (27? - 7/eL 2 ) < 1 - 777/c- 

Thus, for < 7fe < and by taking expectations, inequality (8) reduces to 

E[||a*+i - x*|| 2 ] < (1 - rjjkHWxk - **|| 2 ] + lW for all k > 0. (9) 

We begin by viewing E — x* || 2 ] as an error e k +i arising from employing the stepsize sequence 70, 71, ... , 7^. 

Furthermore, the worst case error arises when (9) holds as an equality and satisfies the following recursive 
relation: 

efc+i(7o,...,7fc) = (1 -^7fc) e /e(7o,---,7fc-i) +7^ 2 - 
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Motivated by this relationship, our interest lies in examining whether the stepsizes 70,71 ■•■,7fc can be 
chosen so as to minimize the error e^. Our goal lies in constructing a stepsize scheme that allows for claiming 
the almost sure convergence of the sequence {xk} produced by algorithm (7) to the unique solution of 
VI(X, F). We formalize this approach by defining real-valued error functions ek(jo^ . . . , 7&-i) as follows: 

e/c(7o,-..,7/c-i) = (1 -^7/c-i)e/c-i(7o,-..,7/c-2) +7fc-i^ 2 for k > h ( 10 ) 

where eo is a positive scalar, n is the strong monotonicity constant and v 2 is an upper bound for the second 
moments of the error norms \\wk\\- We consider a choice of {70,71, • • • , 7fc-i} based on minimizing an upper 
bound on the mean-squared error, namely e(7o, 71, . . . , 7/c-i), as captured by the following optimization 
problem: 

minimize e fc (70, ... ,7^-1) 

subject to < 7 j < j2 for all j = 0, . . . , k — 1. 

To ensure convergence in an almost-sure sense, the sequence {7^} needs to satisfy YlfLolj ~ 00 ano ^ 
J^JLolj < 00 • As the next two propositions show, these can indeed be achieved. In fact, the error e^+i at 
the next iteration can also be minimized by selecting 7^ as a function of only the most recent stepsize jk-i- 
In what follows, we consider the sequence {7^} given by 

7o = ^e (11) 
ll = ll-i (l " forallfc>l. (12) 

We provide a result showing that the stepsizes 7^, z = 0, . . . , A; — 1, minimize over (0, where L is the 
Lipschitz constant associated with F{x) over X. 

Proposition 1 (An adaptive steplength SA (ASA) scheme). Let the error function ek(jo, • • • , lk-i) be 
defined as in (10), where cq > is such that v > L where L is the Lipschitz constant of F. Let the 

sequence {7^} be given by (11) —(12) . Then, the following hold: 

(a) For all k > 0, the error e k satisfies e^^o, . . . , 7j£_ 1 ) = 7^. 

(b) For each k > 1, the vector (70 , 7i , • • • , Jk-i) ^ s ^ e m ^ m ^ zer of the function ^(70, . . . , 7/c-i) over the 
set 

G k ^{aeR k :0<aj<^forj = l,...,ky 

More precisely, for any k > 1 and any (70, . . . , 7/c-i) G Gk, we have 

e/c(7o,...,7/c-i) -efe(75,"- J 7fc-i) > v 2 (jk-i - it-if • 

The almost-sure convergence of the produced sequence holds for a family of steplength rules, as captured 
by the folloing result. 

Proposition 2 (Almost-sure convergence of ASA scheme). Let Assumptions 1 and 2 hold. Assume that the 
stepsize sequence {jk} is generated by the following adaptive scheme: 

Ik = 7fc-i(l - c 7fc-i) for all k > 1, 

where c > is a scalar and the initial stepsize is such that < 70 < \. Then, the sequence {x k } generated 
by algorithm (7) converges almost surely to a random point that belongs to the optimal set. 

The proofs of Propositions 1 and 2 are omitted, as they follow from a more general results for a distributed 
SA method, as discussed in the next subsection. 
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3.2 A distributed adaptive steplength SA (DASA) scheme 

Unfortunately, in multi-agent regimes, the implementation of the stepsize rule (11)-(12) requires a central 
coordinator who can prescribe and enforce such rules. In this section, we extend the centralized rule to 
accommodate a multi-agent setting wherein each agent chooses its own update rule, given the global knowl- 
edge of problem parameters. In such a regime, given that the set X is the Cartesian product of closed and 
convex sets Xi, . . . , X^r, our interest lies in developing steplength update rules in the context of method (4) 
where the i-th agent chooses its steplength, denoted by 7^, as per 

7M = 7fc-i,i(l — Ci7fc-l,i)> 

with Ci > being a constant associated with agent i mapping Fi(x), while the initial stepsize 70, i is suitably 
selected. The following assumption imposes requirements on the stepsizes jk,i m (4). 

Assumption 3. Assume that the following hold: 

(3a) The stepsize sequences {7^}, i = 1, . . . ,N, are such that jk,i > for all k and i, with YlkLo 7M = 00 
and XlfcLo 7ifc,i < 00 f or a ^ 

(3b) If {5k} and {Tk} are positive sequences such that 5k < mini<i<Ar jk,i and Tk > maxi<^<Ar Jk,i for all 
k > 0, then 

Tk ~ 6k < (3 for all k > 0, 

Ok 

where j3 is a scalar satisfying < j3 < £ . 

Remark: Assumption (3a) is a standard requirement on steplength sequences while Assumption (3b) 
provides an additional condition on the discrepancy between the stepsize values 7^ at each iteration k. This 
condition is satisfied, for instance, when 7^1 = . . . = jk,N, m which case /3 = 0. 

When deriving an adaptive rule, we use Lemma 1 and a distributed generalization of Lemma 2, which is 
given below. 

Lemma 3. Consider algorithm (4). Let Assumptions 1 and 2 hold. 

(a) The following relation holds almost surely for all k > 0: 

E[||a*+i - x*|| 2 I F k ] < (1 - 2(77 + L)5 k + 2LT k + L 2 T 2 k )\\x k - x*|| 2 + T 2 k v 2 , 

where {5 k } and {T k } are positive sequences such that 5 k < mini<^<7v Jk,i a nd Tk > maxi<^<vy Jk,i f or 
all k. 

(b) If Assumption (3b) holds, then the following relation is valid for all k > 0: 

E [\\x k+1 - x* || 2 ] < (1 - 2(7, - 0L)5 k + (1 + fi) 2 L 2 %)E[\\x k - x*\\ 2 ] + (1 + 0) 2 5 2 k v 2 . 

Proof, (a) From the properties of the projection operator, we know that a vector x* solves VI(A, F) problem 
if and only if satisfies x* = IIx(#* — jF(x*)) for any 7 > 0. By the definition of algorithm (4) and the 
non-expansive ness property of the projection operator, we have for all k > and all z, 

\\%k+l,i -^*|| 2 = \\RXi(Xk,i -Jk,i( F i( X k) +Wk,i)) -H-Xiix* - 7/c,i^i(^*))|| 2 
< \\x k ,i ~ X* - J k ,i( F i( X k) + Wk,i ~ ^i(^*))l| 2 - 

Taking the expectation conditioned on the past, and using E[i^ | Tk] = 0, we have 

E[\\x k+hl -x*\\ 2 \F k ] < \\xk,i - x*\\ 2 + r)l ti \\Fi(x k ) - Fi(x*)\\ 2 + 7| i E[||wfe ii || 2 | F k \ 
- 27fe,i(^,i - x*) T (Fi(x k ) - Fi(x*)). 
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Now, by summing the preceding relations over z, we have 

N N 

E[||zfc+i - x*\\ 2 I F k ] < \\x k - x*\\ 2 + ^2lfk,i\\ F i( x k) - F i( x *)\\ 2 +^2ll,iE[\\ w k,i\\ 2 I Fh] 

i=l i=l 

N 

- 2 J2 7m(*M - x^fiF^Xk) - Fi(x*)). 

Using < Tfe and Assumption 2, we can see that Y^iLi ll i^- [\\ w k,i\\ 2 \ F k ] < ^\ v<1 almost surely for all 
k > 0. Thus, from the preceding relation, we have 

N 

E[\\x k+1 - x*f | T k ] <\\x k - x*\\ 2 + J2<iW F M ~ F i(x*)W 2 + r l^ 2 

1=1 

Term 1 

AT 

-2^ 7M (z M - x^fiF^Xk) - Fi(x*)) . (13) 



Term 2 

Next, we estimate Term 1 and Term 2 in (13). By using the definition of T k and by leveraging the Lipschitzian 
property of mapping F, we obtain 

Terml < r 2 k \\F(x k ) - F(x*)f < r 2 L 2 \\x k - x*\\ 2 . (14) 

By adding and subtracting — 2 Y^iLi $k( x k,i — x i) T (Fi(x k ) — F^x*)) from Term 2, and using Y^iLi( x k,i — 
x*) T (Fi(x k ) - Fi(x*)) = (x k - x*) T (F(x k )- F(x*)), we further obtain 

N 

Term2 < - 2S k (x k - x*) T (F(x k ) - F(x*)) - 2 ^( 7m - 5 k )(x kfi - x*) T (F t (x k ) - Fi(x*)). 

i=l 

By Cauchy- Schwartz inequality, the preceding relation yields 

N 

Term2 < - 2S k (x k - x*) T (F(x k ) - F(x*)) + 2( 7m - S k ) ^ \\x Ki - xt\\\\Fi(x k ) - Fi(x*)\\ 

< - 2S k (x k - x*f(F(x k ) - F(x*)) + 2(T k - 5 k )\\x k - x*\\\\F(x k ) - F(x*)\\, 

where in the last relation, we use the definition of T k and Holder's inequality. Invoking strong monotonicity 
of the mapping F for bounding the first term and by utilizing the Lipschitzian property of the second term 
of the preceding relation, we have 

Term2 < -2 V S k \\x k - x*|| 2 + 2(T k - S k )L\\x k - x*|| 2 . (15) 

The desired inequality is obtained by combining relations (13), (14), and (15). 

(b) Assumption 3b implies that T k < (1 + P)5 k . Combining this observation with the result of part (a), we 
obtain almost surely for all k > 0, 

E[||ar fc+1 - x*\\ 2 | T k \ <(1 - 2( V - pL)S k + (1 + p) 2 L 2 8 2 )\\x k - x*\\ 2 + (l+p) 2 5 2 V 2 . 

Taking expectations in the preceding inequality, we obtain the desired relation. □ 

The following proposition proves the almost-sure convergence of the distributed SA scheme when the 
steplength sequences satisfy the bounds prescribed by Assumption 3b. 

Proposition 3 (Almost-sure convergence of distributed SA scheme). Let Assumptions 1, 2, and 3 hold. 
Then, the sequence {x k } generated by algorithm (4) converges almost surely to the unique solution of 
Vl(X.F). 
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Proof. Consider the relation of Lemma 3(a). For this relation, we show that the conditions of Lemma 1 are 
satisfied, which will allow us to claim the almost-sure convergence of Xk to x*. Let us define Vk — \\xk — x*\\ 2 , 
and 

a k ±2(r ] -pL)6 k -L 2 5 2 (l + p) 2 , Mfc 4 (1 + p) 2 5 2 v 2 . (16) 

Next, we show that < a& < 1 for k sufficiently large. Since jk,i tends to zero for alH = 1, . . . , TV, we may 
conclude that Sk goes to zero as k grows. In turn, as Sk goes to zero, for k large enough, say k > ki, we have 

(l + /3) 2 L 2 S k 
2( V -/3L) >U ' 

By Assumption 3b we have f3 < which implies 77 — f3L > 0. Thus, we have a& > for k > k\. Also, 
for k large enough, say k > &2> we have < 1 since — > 0. Therefore, when fc > maxj/ci,/^} we nave 
< afc < 1. Obviously, Vk^k > 0. 

From Assumption 3b we have Sk < Jk < (1 + fc> r all fc. Using these relations and the conditions 
on jk,i given in Assumption 3a, we can show that YlkLo ^k — 00 and J^fcLo ^fc < 00 • Furthermore, from 
the preceding properties of the sequence and the definitions of a& and /i^ in (16), we can see that 

SjbLo a k ~ 00 an d Xlfclo Mfe < 00. Finally, by the definitions of ^ and /ifc we have 

lim ^ =lim / (l + /?) 2 4^ 2 ^_ (l + ^Qimfc^oo^ 2 



implying that lim^oo ^ = since 5k —> 0. Hence, all conditions of Lemma 1 are satisfied and we may 
conclude that \\xk — x* || 2 almost surely. □ 

Proposition 3 states that under specified assumptions on the set X and mapping F, the stochastic errors 
Wk, and the stepsizes 7^, the distributed SA scheme is guaranteed to converge to the unique solution of 
VI (X, F) almost surely. Our goal in the remainder of this section lies in providing a stepsize rule that aims 
to minimize a suitably defined error function of the algorithm, while satisfying Assumption 3. To begin our 
analysis, we consider the result of Lemma 3b for all k > 0: 

E[||x fc+1 - x*\\ 2 } < (1 - 2(r? - /3L)S k + (1 + /3) 2 L 2 5 2 )E[\\x k - x*\\ 2 ] + (1 + /3) 2 5 2 v 2 , (17) 
where Sk < mmi<i<N 7fc,i- When the stepsizes jk,i are further restricted so that < Sk < ^^2, we have 



1 - 2(7/ - f3L)5 k + (1 + P) 2 L 2 5 2 k < 1 - (77 - f3L)5 k . 

1 

(1+/3) 2 



Thus, for < Sk < (iLb^l 2 > f rom inequality (17) we obtain 



E\\\x k+1 -x*\\ 2 ] < (l-( v -f3L)5 k )E[\\x k -x*\\ 2 ]+(l + f3) 2 5 2 v 2 for all k > 0. (18) 

Similar to the discussion in Section 3.1 in the context of the ASA scheme, let us view the quantity 
E[||xfc+i — x*|| 2 ] as an error e^+i of the method arising from the use of the lower bounds So, Si, . . . , Sk 
for the stepsize values 7o,i, 7i,i • • • , 7/c,i, i = 1, • • • , N. Relation (18) gives us an error estimate for algorithm 
(4) in terms of the lower bounds 5q, Si, . . . , Sk- We use this estimate to develop an adaptive stepsize proce- 
dure. Consider the case when (18) holds with equality, which is the worst case error. In this case, the error 
satisfies the following recursive relation: 

e/c+i = (1 - (V ~ PL)5k)e k + (1 + f3 2 )v 2 5 2 k for all k > 0. 

Let us assume that we want to run the algorithm (4) for a fixed number of iterations, say K. The preceding 
relation shows that ck depends on the lower bound values up to the if th iteration. This motivates us to view 
the lower bounds Sq,S\,. . . , Sk-i as decision variables that can be used to minimize the corresponding upper 
bound on the mean-squared error of the algorithm up to iteration K. Thus, the variables are So, Si, . . . , 5k-i 
and the objective function is the error function e^(^o, Si, . . . ,5k-i)- We proceed to derive a rule for gen- 
erating lower bounds Sq, Si, ... ,5k by minimizing the error e^+i. Importantly, it turns out that Sk is a 
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function of only the most recent bound Sk-i- We define the real- valued error function e k (So, Si, • • • ,Sk-i) 
by considering an equality in (18): 

e fc+1 (£ , • • • , S k ) 4(1 - (77 - pL)5 k )e k (5 , . . . , + (1 + /? V** for all fc > 0, (19) 

where en is a positive scalar, {S k } is a sequence of positive scalars such that < S k < ^^2, L is the 
Lipschitz constant of the mapping F, n is the strong monotonicity parameter of F, and v 2 is the upper 
bound for the second moment of the error norms ||wjfe|| (cf. Assumption 2). 
Now let us consider the stepsize sequence {S^} given by 



St = SU ( 1 - ( 5—^1 *2-i ) for all fc > 1, (21) 



where en is the same initial error as for the errors e k in (19). In what follows, we often abbreviate 
e k (S$, Sk-i) by e k whenever this is unambiguous. The next proposition shows that the lower bound 
sequence {S%} for 7^ given by (20)-(21) minimizes the errors e k over [0, jj^^j^ Y • 

Proposition 4 (An adaptive lower bound steplength SA scheme). Let e k (So, . . . , S k -i) be defined as in (19), 
where en is a given positive scalar, v is an upper bound defined in Assumption 2, n and L are the strong 
monotonicity and Lipschitz constants of the mapping F respectively and v is chosen such that v > L^f^. 
Let P be a scalar such that < j3 < £ ; and let the sequence {S^} be given by (20) -(21). Then, the following 
hold: 

(a) For all k>0, the error e k satisfies e k (S^ ^^j^ S%. 

(b) For any k > 1, the vector (5q , 5^ , . . . , Sl_ ± ) is the minimizer of the function e k (So, . . . , Sk-i) over the 
set 

G k 4 | a e R k : <aj < ^~^ 3 = 1, • • • , *j • 
More precisely, for any k > 1 and any (So, ■ ■ ■ , Sk-i) & G k , we have 

e k (5 , - e k (S* , 8*^) > (1 + /?) V(4-i " <^-i) 2 - 



Proof, (a) To show the result, we use induction on k. Trivially, it holds for k = from (20). Now, suppose 

(l+/3) 2 ^ 
77-/3L 



2(1-1- 8) v 

that we have e/^Jg, . . . , S\_^) = \_q L — S% for some fc, and consider the case for fc + 1. From the definition 



of the error e k in (19), we have 

e k+1 (S* , ...,«£) = (1 - fa - pL)5* k )e k (5* , 5* k _ t ) + (1 + /?) V(<5£) 2 
= (1-(V- PL)o* k ) 2( l + 5* k + (1 + /3) V(^) 2 , 

where the second equality follows by the inductive hypothesis. Thus, 



e k+1 (S ,. ..,S k )= S k \1 - — ^— S k j = S k+1 , 



where the last equality follows by the definition of in (21). Hence, the result holds for all k > 0. 
(b) First we need to show that (5q, . . . , ^_ x ) G G k . By our assumption on en, we have < en < 2 p-, 
which by the definition of Sq in (20) implies that < Sq < , i.e., Sq G Gi. Using the induction on 

fc, from relations (20)-(21), it can be shown that < 5% < S^_ ± for all k > 1. Thus, (5q, • • • ,^_i) G G& 
for all fc > 1. Using the induction on k again, we now show that the vector (Sq,S*,..., ^_ x ) minimizes the 
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2 f 1-1- iP 1 

error e k (5o, . . . , S k -i) for all fc > 1. From the definition of the error e\ and the relation ei(^o) = \-f3L — ^* 
shown in part (a), we have 

ei(So) - ei(5S) = (1 - fa - /^ )e + (1 + /3) VtJg - 1 ^ r 5J. 

77 — pi, 



Using 5* = 5%(l- (cf. (21)), we obtain 

1 + R^i^S 2 - - 



0(1 _|_ m 2 7 y 2 

ei(Jo) - ei(<5 *) = (!-(??- pL)5 )e + (1 + /?)V<5g - 1 +P J r 5 * + (1 + /?)V(5 *) 2 . 



Since e = 2(1 J~_% v2 55 (cf. (20)), it follows that 

eitfo) - e 1 (5* ) = -2(1 + /3) V<5 <5 * + (1 + /3)V<5 2 + (1 + /3)V(<5 *) 2 = (1 + /?) V (<5 - <5 *) 2 , 

showing that the inductive hypothesis holds for k = 1. Now, suppose that 

e fe (5 , . . . , 4-i) - e fc (<5 *, . . . , 5* k _ t ) > (1 + /?) V(4-i - <5fc-i) 2 . (22) 

holds for some fc and for all (#o, . . . , S k -i) G G&. We next show that relation (22) holds for fc + 1 and 
for all (5oj---j^fe) G Gfc+i. To simplify the notation, we use e^ +1 to denote the error e k +i evaluated at 
(5q, 5J, . . . , and e^+i when evaluating at an arbitrary vector (5 , • • • , Sk) £ Gfc+i. Using (19) and part 
(a), we have 



2(1 + Z?) 2 ^ 2 

- 4 + i = (1 " fa ~ PL)5 k )e k + (1 + 0) WjJ - 1 W 



Under the inductive hypothesis, we have e k > e£ (cf. (22)). When (So, 5i, ... ,5k) & <Gfc, we have 5k < 
(i+P)*L 2 ■ Next, w e show that (i+^rp < ?) _ 1 <3L • By the definition of strong monotonicity and Lipschitzian 
property, we have rj < L. Using r\ <L and < /3 < £ we obtain 

r] < (1 + 0)L => T7 - £L < (1 + /3)L 
^2 wi , « 2r2 . 



>(7? - /3L) 2 < (1 + /?) 2 L 2 =>■ " < 



(1+P) 2 L 2 " t? -PL' 

This implies that for (So, Si, ... ,5k) € <Gfc, we have 5k < ^pjz or equivalently 1 — (77 — (3L)5 k > 0. Using 

2(1-1- l/"^ 

this, the relation e£ = — ^ of part (a), and the definition of we obtain 

e fe +i - e fc+1 > (1 - (r? - _ ^ 5 k + (1 + /3) v 5 k _ — 5 k II — S k 

= (l + P)^(S k -5t) 2 . 

Hence, we have e k - e£ > (1 + f3) 2 v 2 (5 k -i - ^_i) 2 for all fc > 1 and all (<5 , . . . , $k-i) G G&. □ 

Remark: From Proposition 4, the minimizer (r5J, . . . , ^_ x ) of over G& is unique up to a scaling by 
a factor p G (0, 1). Specifically, the solution (5q , . . . , is obtained for an initial error eo > satisfying 

v > £ j where eo can be chosen to be arbitrarily large by scaling v appropriately. Suppose that in the 
definition of the sequence peo is employed instead of eo for some p G (0, 1). Then it can be seen (by 

following the proof) that, for the resulting sequence, Proposition 4 would still hold. □ 

We have just provided an analysis in terms of a lower bound sequence {S k }. We may conduct a similar 
analysis for an upper bound sequence {T k }. In particular, from Lemma 3a we have 

E [\\x k+1 - x* || 2 ] < (1 - 2(r] + L)5 k + 2LT k + L 2 r 2 )E [\\x k - x* || 2 ] + T 2 ^ 2 for all fc > 0. 
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When r K 6k < (3 with < (3 < f, we have < and we obtain the following relation: 



5k 

E[||* fe+1 - **|| 2 ] < (i - fc^r fc + 2ir fe + L 2 r 2 fe )E[||z fe - **|| 2 ] + r> 2 . 

When is further restricted so that < < {J^y^i •> we have 

E[||x fe+1 - x*|| 2 ] < (1 - fcMr^EfHajfe - z*|| 2 ] + I> 2 for all k > 0. 

Using the preceding relation and following a similar analysis as in the proof of Proposition 4, we can show 
that the optimal choice of the sequence {1^} is given by 



2(1 + /3)^ 2 



F = of, , e "' ( 23 ) 



n = n-i (i - ^-fj } T t-i) for a11 k ^ !. ( 24 ) 

9 2 

where eo is such that < eo < 

In the following lemma, we derive a relation between two recursive sequences, which is employed within 
our main convergence result for adaptive stepsizes {jk,i}- 

Lemma 4. Suppose that sequences {Xk} and {jk} are given with the following recursive equations for all 
k>0, 

Afe+i = Afc(l - A*.), 7/c+i = 7fe(l - C7fe), 
where c > is a given constant and Ao = c~7o. Then for all k > 0, Xk = cjk- 

Proof. We use the induction on k. For k = 0, the relation holds since Ao = C70. Suppose that for some k > 
the relation holds. Then, we have 

7/c+i = 7/c(l - C7fc) => c7/e + i = C7fc(l - 07^) => C7 fc+ i = A fc (l - A fc ) => 07^+1 = A fc +i- 
Hence, the result holds for k + 1 implying that it holds for all & > 0. □ 

Using Lemma 4, we now present a relation between the lower and upper bound sequences given by 
and {r^}, respectively. 

Lemma 5. Suppose that the sequences {S^,} and {T^} are given by relations (20)-(21) and (23)-(24), 
respectively, where < e < ^£ and < /3 < TAen, /or a// k > 0, T% = (1 + 

Proof Suppose that {Xk} is defined by the following recursive equation 

Afe+i = Afc(l - X k ), for all k > 0, 

where Ao = 4(1+^2^2 eo- To obtain the result, we apply Lemma 4 to sequences {A&} and and then to 

sequences {A&} and {r^}. Specifically, Lemma 4 implies that A& = T1 ~^ 1 J - 5% for all k > 0. Invoking Lemma 
4 for sequences {A/e} and {r^}, we have A& = 2(i+p) ^k' From the preceding two relations, we conclude that 
r* = (1 + for all A: > 0. ^ □ 

The relations (20)-(21) and (23)-(24), respectively, are essentially adaptive rules for determining the best 
upper and lower bounds for stepsize sequences {jk,i}, where "best" corresponds to the minimizers of the 
associated error bounds. Having provided this intermediate result, our main result is stated next and shows 
the almost-sure convergence of the distributed adaptive SA (DAS A) scheme. 
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Theorem 1 (A class of distributed adaptive steplength SA rules). Suppose that Assumptions 1 and 2 hold, 
and assume that the set X is bounded. Suppose that, for all i = 1, . . . , N, the stepsizes {jk,i} algorithm (4) 
are given by the following recursive equations: 

D 2 

7o,2 = nc — -2 — , (25) 

7fe,t = 7fe-i,i ~ ^7fc-i,i^ / or ^ fc > 1. (26) 

w/iere D = max^^x ||x — xo||, c zs a scalar satisfying c G (0, |) ; zs a parameter such that ri G [1, 1 + IL ^ £ ], 
r] is the strong monotonicity parameter of the mapping F, L is the Lipschitz constant of F , and v is the upper 
bound defined in Assumption 2. We assume that the constant v is chosen large enough such that v > 
Then, the following hold: 



(a) For any z, j = 1, . . . , N and k > 0, = 

Ti rj 

(b) Assumption 3b holds with ft = , 5k = 8%' — Hs* and eo = D 2 , where SI and are given by 
(20)-(21) and (23)-(24), respectively. 

(c) The sequence {xk} generated by algorithm (4) converges almost surely to the unique solution of VI(X, F). 

(d) The results of Proposition 4 hold for 51 when eo = D 2 and ft = IL j^. 
Proof, (a) Consider the sequence {Xk} given by 



c 2 

A ° = (1 + 2^)2^2 D ^ 

Afc+i = A fc (l - X k ) for all fc > 1. 



Since for any z = l,...,iV, we have Ao = (c/r^) 70, %, using Lemma 4 we obtain A& = (c/r^jk^ f° r a U 
i = 1, . . . , N and k > 0. Hence, the desired relation follows. 

(b) First we show that 5% and T£ are well defined. Consider the relation of part (a). Let k > be arbitrarily 
fixed. If jk,i > Jk,j f° r some i j, then we have > rj. Therefore, the minimum possible jk,i is obtained 
with ri = 1 and the maximum possible 7^ is obtained with ri = 1 + ?7 ^ L 2c . Now, consider (25)-(26). If, 
r$ = 1, and D 2 is replaced by eo, and c by , we get the same recursive sequence defined by (20)-(21). 
Therefore, since the minimum possible 7^ is achieved when = 1, we conclude that 5% < min^=i 5 ... 5 jv 7fc,i 
for any fc > 0. This shows that 5£ is well-defined in the context of Assumption 3b. Similarly, it can be shown 
that is also well-defined in the context of Assumption 3b. Now, Lemma 5 implies that = (1 + 77 ^ 2c )^ 
for any fc > 0, which shows that the inequality in Assumption 3b is satisfied with ft = ^^7^, where < ft < £ 
since < c < 

(c) In view of Proposition 3, to show the almost-sure convergence, it suffices to show that Assumption 3 
holds. Part (b) implies that Assumption 3b is satisfied by the given stepsize choices. As seen in Proposition 
3 of [41], Assumption 3a holds for any positive recursive sequence {A^} of the form \k+i = Afc(l — aA&). Since 
each sequence 7^ is a recursive sequence of this form, Assumption 3a follows from Proposition 3 in [41]. 

(d) It suffices to show that the hypotheses of Proposition 4 hold when eo = D 2 and ft — ri ~ ] 2c . Relation 
v > y| follows from v > L^J~^. Also, as mentioned in part (c), since < c < |, the relation < ft < £ 
holds for any choice of c within that range. Therefore, the conditions of Proposition 4 are satisfied. □ 

Remark: Theorem 1 provides a class of adaptive stepsize rules for the distributed SA algorithm (4), i.e., 
for any choice of parameter c such that < c < |, relations (25)-(26) correspond to an adaptive stepsize 
rule for agents 1,...,7V. Note that if c = |, these adaptive rules will represent the centralized adaptive 
scheme given by (11)— (12). □ 

In a distributed setting, each agent can choose its corresponding parameter 7^ from the specified range 
[1, 1 + 7? ^ L 2c ]. This requires that all agents agree on a fixed parameter c and have a common estimate of 
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parameters rj and L. Yet, this scheme does not allow complete flexibility for the agents and requires some 
global specification of parameters such as 77, L, and c. In the next section, we address the setting where the 
Lipschitz constant is unavailable in a global setting or when the mapping F may not be Lipschitzian are 
addressed. 

4 Non-Lipschitzian mappings and local randomization 

A key shortcoming of the proposed DAS A scheme, given by (25)- (26), is the requirement of the Lipschitzian 
property of the mapping F with a known parameter L. However, in a range of problem settings, the following 
may arise: 

• Unavailability of a Lipschitz constant: In many settings, either the mapping may be non-Lipschitzian or 
the estimation of such a constant may be problematic. It may also be that this constant may not be available 
across the entire population of agents. 

• Nonsmoothness in payoffs: Suppose the Cartesian stochastic variational inequality problem represents the 
optimality conditions of a stochastic convex program with nonsmooth (random) objectives or the equilibrium 
conditions of a stochastic Nash game in which the payoff functions are expectation-valued with random 
nonsmooth integrands. In either setting, the integrands associated with each component's expectation 
are multi- valued. In such a setting, a randomization or smoothing technique applied to each agent's payoff 
which leads to an approximate mapping that can be shown to be Lipschitz and single- valued. The associated 
Lipschitz constant can be specified in terms of problem parameters and smoothing specifications, allowing 
us to develop a locally randomized SA algorithm for stochastic variational inequalities without Lipschitzian 
mappings. 

In Section 4.1, we present the rudiments of our randomization approach and discuss its generalizations 
in Section 4.2. Finally, in Section 4.3, we present a distributed locally randomized SA scheme and provide 
suitable convergence theory. 

4.1 A randomized smoothing technique 

In this part, we revisit a smoothing technique that has its roots in work by Steklov [36,37] in 1907. Over 
the years, it has been used by Bertsekas [3], Norkin [25] and more recently Lakshmanan and De Farias [22]. 
The following proposition in [3] presents this smoothing technique for a nondifferentiable convex function. 

Proposition 5. Let f : R n — » R be a convex function and consider the function f e (x) 



where 00 belongs to the probability space (R n ,£? n ,P) ; B n is the a— algebra of Borel sets of W 1 and P is a 
probability measure on B n which is absolutely continuous with respect to Lebesgue measure restricted on B n . 
Then, if E[f(x — uj)] < 00 for all x G R n , the function f e is everywhere differentiate. 

This technique has been employed in a number of papers such as [13,22,41] to transform / into a smooth 
function. In [22], authors consider a Gaussian distribution for the smoothing distribution and show that 
when function / has bounded subgradients, the smooth function f e has Lipschitz gradients with a prescribed 
Lipschitz constant. A challenge in that approach is that in some situations, function / may have a restricted 
domain and not be defined for some realizations of the Gaussian random variable. 

Motivated by this challenge, in [41], we consider the randomized smoothing technique using uniform 
random variables defined on an n-dimensional ball centered on origin with radius e > 0. This approach is 
called "locally randomized smoothing technique" and is used to establish a local smoothing SA algorithm for 
solving stochastic convex optimization problems in [41]. We intend to extend this smoothing technique to the 
regime of solving stochastic Cartesian variational inequality problems and exploit the Lipschitzian property 
of the approximated mapping. In the following example, we demonstrate how the smoothing technique works 
for a piecewise linear function. 

Example 3 (Smoothing of a convex function). Consider the following piecewise linear function 



r(x)±E[f(x-w)}, 




—2x — 3 for x < —2, 
-0,3a; + 0.4 for - 2 < x < 3 
x — 3.5 for x > 3. 
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(a) The original function f(x) 



(b) The smoothed function f e (x) 



(c) Different smooth, parameters 



Figure 1: The smoothing technique 



Suppose that z is a uniform random variable defined on [— e, e] where e > is a given parameter. Consider 
the approximation function f e = E[f(x + z)). Proposition 5 implies that f e is a smooth function. When e is 
a fixed constant satisfying < e < 2.5 ; the smoothed function f e has the following form: 



Figure 1 shows such a smoothing scheme. In Figure la, we observe that function f is nonsmooth at x = —2 
and x = 3. Figure lb shows the approximation f e when e = 0.5. An immediate observation is that function f e 
is smooth everywhere. Furthermore, the smoothing technique perturbs x locally at all points, including points 
of nonsmoothness. Finally, Figure lc shows the smoothing scheme for different values of e and illustrates 
the exactness of the approximation as e — » 0. 

4.2 Locally randomized techniques 

Motivated by the smoothing technique described in previous part, we introduce two distributed smoothing 
schemes where we simultaneously perturb the value of vectors xi with a random vector Zi for i = 1, . . . , N. 
The first scheme is called a multi- spherical randomized (MSR) scheme, where each random vector Zi G W 11 
is uniformly distributed on the n^-dimensional ball centered at the origin with radius e^. In the second 
scheme, called a multi-cubic randomized (MCR) scheme, we let Zi G be uniformly distributed on the 
r^-dimensional cube centered at the origin with an edge length of 2e^. 

Now, consider a mapping F that is not necessarily Lipschitz. We begin by defining an approximation 
F e : X — >• R n as the expectation of F(x) when x is perturbed by a random vector z = {z\\ . . . ; zn). 
Specifically, F e is given by 



where Fi, . . . , Fn are coordinate-maps of F, z = (z\ \ . . . ; z n) and the random vectors Zi are given by MSR 
or MCR scheme. 

4.2.1 Multi-spherical randomized smoothing 

Let us define B n (x,p) C W 1 as a ball centered at a point x with a radius p > 0. More precisely, 



40e V ' ~ 1 

f e (x) = { -0.3x + 0.4 



f -2x - 3 

^ (I7x 2 + 6Sx - 46xe + 68 - 52e + 17e 2 ) 



for x < — 2 — e, 
for - 2 - e < x < -2 + e, 
for - 2 + e < x < 3 - e, 
for 3 — e<x<3 + e, 
for x > 3 + e. 



4^ (I3x 2 - 78x + 14xe + 117 - 62e + 13e 2 ) 
x — 3.5 




for all x G X, 



(27) 



B n (x, p) — {y & K™ | ||y-x|| <p}. 
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In this scheme, assume that for all i = 1, . . . , N random vector Zi G B n .(0,€i) is uniformly distributed 
and independent with respect to random vectors Zj for j ^ i. For the approximation mapping F e to be 
well-defined, F needs to be defined over the set X e s given by 



N 



X e s ±X + Y[B ni (0,ei). 



i=l 

This means that X| = {{x\ + z\, . . . , xjy + zn)\x G X,Z{ G M n % ||^|| < for all i = 1, . . . , A/"}, where the 
constants q > are given values and e = (ei, . . . , ejv)- Note that the subscript s stands for the MSR scheme. 
This scheme is developed based on the following assumption. 

Assumption 4. The mapping F : X e s — >> R n is bounded over the set X e s . In particular, for every i = 1, . . . , N, 
there exists a constant d > such that \\Fi(x)\\ < Ci for all x G X e s . 

Under this assumption, we will show that the smoothed mapping F e produced by the MSR scheme is 
Lipschitz continuous over X and we will compute its Lipschitz constant. To do so, we make use of the 
following lemma. 

Lemma 6. Let z G W 1 be a random vector generated from a uniform density with zero mean over an 
n-dimensional ball centered at the origin with a radius e. Then, the following relation holds: 

r^] | II r£ ^|| 

\p u (z - x) -p u (z - y)\dz < k- ^—tt for all x,y G M n , 

[n — e 

where k = 1 if n is odd and k = ^ if n is even, n\\ denotes double factorial of n, and p u is the probability 
density function of random vector z given by 




^ forzeB n (0,e), 
Pu(z) = { n (28) 
otherwise, 

7T 2" 

where c n — —— -, and T is the gamma function given by 

if n is even, 
fa 2 (^+i)/2 if n is °dd. 

Proof. The result is shown within the proof of Lemma 8 in the extended version of [41]. □ 

We next provide the main result of this subsection, which establishes the Lipschitz continuity and bound- 
edness properties of the approximation mapping F e . It also provides the Lipschitz constant of F e for the 
MSR scheme in terms of problem parameters. 

Proposition 6 (Lipschitz continuity and boundedness of F e under the MSR scheme). Let Assumption 4 
hold and define vector C = (Ci, . . . , Cn)- Then, for any e = (ei, . . . , e/v) > we have the following: 

(a) F e is bounded over the set X, i.e, \\F e (x)\\ < ||C|| for all x G X . 

(b) F e is Lipschitz continuous over the set X . More precisely, we have 

||F e (x) - F^y)\\ < y/N\\C\\ . max k j " - \ \\x - y\\ for all x, y G X, (29) 
where k,j = 1 when rij is odd and kj = ^ when rij is even. 
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Proof, (a) We can bound the norm of F e as follows: 



\\F*(x) 



N 



* £||e[*«( 

\ 2 = 1 



x -\- z) 



p < 



N 



J2£i\mx+z)\\ 2 ]<\\c\i 
\ i=i 



where the first inequality follows from Jensen's inequality and the second inequality is due to the boundedness 

property imposed on F by Assumption 4. 

(b) From the definition of F e in relation (27) we have 

N N 

wf^(x) - F% y )f = mm* + *) - f j(v + *)] ii 2 = E roo* +*)- F Av + *)] n 2 - 

We will add and subtract, sequentially, the values at the vectors u of the form + zi, . . . , + 

Zi-i^xi + Zi, . . . , £at + 2/v) for z = 2, . . . , iV. To keep the resulting expressions in a compact form, we use the 
following notation. For an index set J C {1, . . . ,N}, we let xj = (xi) ie j and x_j = ..,#}-,/• By 

adding and subtracting the terms Fj((y + z){i,...,i}> + 2)-{i, •••,«}) for all z, from the preceding relation we 
obtain 



iV 



||F e (*)-F%)f = ^ 



E[Fj(x + z)- F 3 ((y + *){i } , (z + *)-{!})] 



+ E [^(G/ + *){i}, + *)-{!}) - *i((2/ + *){i,2}, + *)-{i,2})] 



+ E [^j(G/ + (a + *)-{i,...,i-i}) - *i((2J + + *)-{i,...,i})] 



+ E [^'((^ + ^){l,...,AT-2}, (X + ^)-{l,...,AT-2}) - + £){!,.. .,iV-l}, (2/ + ^)-{l,...,7V-l})] 



+ E[Fj((y + ^){i,...,at-i}, (a; + ^)-{i,...,a/-i}) - FjO/ + z)] 



Considering the definition of the vectors vi, . . . , vjy in the preceding relation, we have 

2 



N 



\\F^x)-F^y)f = J2 



TV 



AT AT 
i=i i=l 



where the inequality follows by the convexity of the squared-norm. By using the definitions of V{ and 
exchanging the order of summations in the preceding relation, we obtain 



N 



\\F%x) - F%)|| 2 <Nj2\HFjtt* + *){i}, (* + *)-{!}) - + *){i}, 0* + *)-{!}) 



Term 1 



AT AT 



+ ^E E ll E [^((^ + + - F j((y + (* + II • ( 3 °) 

i=2 j=l 



Term i 
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Next, we derive an estimate for Term 1. From our notation, it follows that for a vector x, = x\. 
In the interest of brevity, in the following, for a vector x, we use x-\ = Recalling the definition of 

p u in (28), we write 



Term 1 



N 

• i JR n i 



N 

E 

N 



J 

JR n i 



Fj^xx +z 1 ,x_ 1 + z- 1 )p u (z 1 )dz 1 - / Fj(2/i + 



+ z-i)p u (si - x{)dsi - / Fj{ti,X- X + z_i)jp u (ti - 2/i)dt 



/ E[i^-(ti,x_i + z_i)] (p u (ti - xi) -p u (ti - yi))dh 



where in the second equality si and t\ are given by s\ = x\ + zi and ti = y\ -\- z\. Using the triangle 
inequality and Jensen's inequality, we obtain 



Terml< Vf / E[||i^-(ti, x_i 

?=1 V^Rni 



+ 2-l)||] |Pn(^l - #l) -Pu(h ~ Vl)\dti 



By the definition of Fj and Assumption 4, the preceding relation yields 



Term 1 < V" ( / Cj |p n (£i - £i) - p u (h - yi)\dU 
?=1 \./R n i 



AT 



where the last inequality is obtained using Lemma 6. Similarly, we may find estimates for the other terms 
in relation (30). Therefore, from relation (30) we may conclude that 



N 



N 



\F\x)-F\y)f<N E ( K 



K j=i ] i=i 

N 



,!! 1 



'(ni-l)!!ei 



\\Xi ~ Vi 



u =1 

|2 



n t !! 1 
'K-l)!!€ t 



2 iV 



Therefore, we have 



\F e (x)-F e (y)\\ < VN\\C\\ max U t 



te{i,...,N} [ (n t 



1^11 
l)!!ej 



1^ — 2/11- 



□ 



Remark: The MSR scheme is a generalization of the local randomization smoothing scheme presented 
in [41]. Note that when N = 1, the Lipschitz constant given in Proposition 6b is precisely the constant given 
by Lemma 8 in [41]. □ 

4.2.2 Multi-cubic randomized smoothing scheme 

We begin by defining C n (x, p) C W 1 as a cube centered at a point x with the edge length 2p > where the 
edges are along the coordinate axes. More precisely, 



C n (x,p)±{yeR n \\\y-x\\ 00 <p}. 
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In the MCR scheme, we assume that for any i = 1, . . . , TV, the random vector Z{ is uniformly distributed on 
the set C ni (0, e^) and is independent of the other random vectors Zj for j ^ i. For the mapping F we will 
assume that it is well-defined over the set X e c given by 



N 



i c £ 4i + J|c„,(o,4 



where > are given values and e = (ei, . . . , e^v), while the subscript c stands for the MCR scheme. We 
investigate the properties of F e for this smoothing scheme under the following basic assumption. 

Assumption 5. The mapping F : X e c — >> R n is bounded over the set X e c . Specifically, for every i = 1, . . . , N, 
there exists a constant C[ > such that \\Fi(x)\\ < C[ for all x G X*. 

The following lemma provides a simple relation that will be important in establishing the main property 
of the density function used in the MCR scheme. 

Lemma 7. Let the vector p G W 71 be such that < pi < 1 for all i = 1, . . . , m. Then, we have 

m 

i -II(i < Mi. 

i=l 

Proof We use induction on m to prove this result. For m = 1, we have 1 — nlii(l ~ Pi) = Pi = IM|i> 
implying that the result holds for m = 1. Let us assume that 1 — ni=i(l — Pi) ^ IHIi holds for m. Therefore, 
we have 



n(i-Pi)>i-x;p* 



i=l i=l 

Multiplying both sides of the preceding relation by (1 — p m+ i), we obtain 

m+1 m m+1 m m+1 

i=l i=l i=l i=l i=l 

Hence, n^"^ 1 (1 — Pi) > 1 — SSl*^ which implies that the result holds for m + 1. Therefore, we conclude 
that the result holds for any integer m > 1. □ 

The following result is crucial for establishing the properties of the approximation F e obtained by the 
MCR smoothing scheme. 



n^Li Cn;(0, €i) for €i > for all i. Let the function p c : R n — » R 6e £/ie probability density function of the 



Lemma 8. Let z e W 1 be a random vector with a zero-mean uniform density over an n-dimensional cube 
random vector z: 

_ l F„„ . r~ n ' N 

1 2-n 

Pc(z) 



otherwise. 



Then, the following relation holds: 



I 



\p c (u - x) - p c (u - y)\du < : — a: -2/ for all x,y GM . 

mm {Cj\ 

l<i<N} 



Proof. Let x,y G W 1 be arbitrary. To simplify the notation, we define sets S x = Yl^ =1 C ni (xi^ €i) and 
S y = rii=i CnXVi, We consider, separately, the case when the cubes S x and S y do not intersect, and the 
case when they do intersect. Before we proceed, we prove the following relation 

S x nS y ^Q if and only if \\ Xi — 2/i||oo < 2q for all i = 1, . . . , TV. (31) 
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To prove relation (31), suppose that the two cubes have nonempty intersection and let u be in the intersection, 
i.e., u G S x fl S y . Then, by the triangle inequality, we have for all z = 1, . . . , iV, 

\\Xi ~ Vi\\oo < \\Xi - |oo + ~ Vi\\oo < 2€i, 

where the last inequality follows from the fact that ^ belongs to each of the two cubes. Thus, when S x nS y =^ 0, 
we have \\xi — yi\\oo < 2e^ for all i. Conversely, suppose now that \\xi—yi\\ 00 <2ei holds for all z = 1, . . . , N. 
Let u = (x + y)/2, and note that by the convexity of the norm || • H^, we have 



Ui - XiWoo = 



y-x 



for all i. 



Thus, it follows that u G S x . Similarly, we find that \\ui — yi\\oo < for all z, which implies that u G S y . 
Hence, u G S x D S y , thus showing that the two cubes have a nonempty intersection. 





(a) MCR scheme (b) MSR scheme 

Figure 2: Calculating the Lipschitz constant in the locally randomized schemes. 

We now consider the integral j Rn \p c (u — x) — p c (u — y)\du for the cases when the cubes do not intersect 
and when they do intersect. 
Case 1: S x fl S y = 0. In this case, we have 



\p c (u-x) -p c (u-y)\du 



/ \p c (u-x) -p c (u-y)\du 



p c (u — x)du, 
p c (u - y)du. 



Consequently 



\p. 



.{u - x) - p c (u - y)\du = / p c (u-x)du+ / p c (u - y)d 

J S x J S v 



u = 2. 



(32) 



By relation (31), there must exist some index z* G {1,...,7V} such that \\xi* — yi* \\oo > 2e^*. Since 
x — y\\oo > || x i* — ||oo? ^ follows that mi J i x ~^|^,| > 2. Using the relationship ||zz||oo < IMI between the 

— j — r > 2. Therefore, using (32), we have 



mini< i < iV {ez} 

infinity-norm and the Euclidean norm, we obtain W x ~v 



j 

J Mr 



\p c (u - x) -p c (u - y)\du < 



1 



min {ej 

Ki<N J 



\\x-y\\ 



(33) 



21 



Case 2: S x H S y ^ 0. Then, we may decompose the integral as follows: 

/ \p c (u - x) -p c (u - y)\du = / \p c (u-x) -p c (u-y)\du+ / \p c (u - x) - p c (u - y)\du 
JR n Js„ns„ Jss.ns?. 



+ / \p c (u - x) - p c (u - y)\du + / \p c (u - x) -p c (u - y)\du. 

J S X \Sy J Sy\S X 

Note that the first two integrals on the right hand side of the preceding equality are zero since p c (u — x) = 
p c (u — y) in the corresponding regions. Figure 2a illustrates this observation 2 . Therefore, we have 

\p c (u — x) — p c (u — y)\du — j p c (u — x)du J c [ p c (u — y)du = 2 ^ / du. 

J S x \Sy J Sy\S x 2 n ]^^_^ £^ J S X \Sy 

Note that the value 2 n YliLi e i i * s ^ ne vomme °f the cube S x , denoted by vo\(S x ). Similarly, the integral 
J s , s du is equal to the volume of the set S x \S y . Thus, we can write 

/ M„ - x) - M u - ,)|* = 2^A^1 = 2 voi(&)- v,^n S ,) , . 

J Rn vol{b x ) vol(6 x ) V vol(6 x ) J 

It can be seen that 

N m 

vo\(s x n s y ) = J] - \ x ^) - vti)\)> 

i=l j = l 

where w(j) denotes the j-th coordinate value of a vector w. Therefore, from the preceding two relations and 
vo\(S x ) = 2 n n£Li e?' we find that 



\p c (u - x) -p c (u - y)\du = 2(1-- 



f^-(nfi(^-Mi)-^)i)) ) 

lli=l e i \i=l j = l ) J 

'(-nn(-^)). 



(34) 



Since the cubes S x and S y do intersect, by relation (31) there must hold \\x{ — yi\\oo < 2e 2 - for all i. Hence. 

2e 



q < IgiCzI y%U)\ < £ or a ij ^ Now, invoking Lemma 7, from (34) we obtain 



i / \ / \ I 7 ^ v^ v> Mi) \\ x i -Villi / II ii 

i=i j=i Z€i i=i €i i=i €i 

where in the last inequality we used the relation between || • ||i and the Euclidean norm. Using Holder's 
inequality, we have 

N , — | N 

rii „ „ „ \/n 

\\x-y\\ 



1=1 



E — n*<-wii< 



V 



\\*-y\\<^r^\ 



implying that 



\p c (u-x) -p c (u-y)\du < /™ \\x - y\\. (35) 

mm \ €i } 

l<i<AT} 

By combining (37), (33), and (35), and using the fact n > 1, we obtain the desired result. □ 

Analogous to Proposition 6, the next proposition derives the Lipschitz constant and boundedness prop- 
erties of the approximation F e under the MCR scheme. 



2 Figure 2b provides a similar graphic for the MSR scheme. 
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Proposition 7 (Lipschitz continuity and boundedness of F e under the MCR scheme). Let Assumption 5 
hold and define vector C = (C(, . . . , C' N ). Then, for any e = (e±, . . . , ejv) > we have the following: 

(a) F e is bounded over the set X, i.e., \\F e (x)\\ < \\C'\\ for all x G X. 

(b) F e is Lipschitz over the set X. More precisely, we have 



11^(^-^(^)11 < 



mm j=1: ... tN {ej} 



if -y\\ 



for all x, y G X. 



(36) 



Proof, (a) This result can be shown in a similar fashion to the proof of Proposition 6a. 
(b) Since the random vector Z{ is uniformly distributed on the set C ni (0,e^) for each i = 1,...,7V, the 
random vector z = {z\ \ . . . ; zn) is uniformly distributed on the set Y\ i=1 C ni (0, e$). By the definition of the 
approximation F e in (27), it follows that for any x,|/Gl, 



\\F*{x)-F*{y)\\ 



< 



I F(x + z)p c (z)dz - / F(y + z)p c (z)dz 

JR n JR n 

/ F(u)p c (u — x)du — I F(v)p c (v — y)dv 

JR n JR n 

/ F(u)(p c (u - x) - p c (u - y))du 

jR n 

/ \\F(u)\\\p c (u- x) -p c {u-y)\du 

JR n 



where in the second equality we let u = x + z and v = y + z, while the inequality follows from the triangle 
inequality. Invoking Assumption 5 we obtain 

\\F*(x)-F*{y)\\ < HC'll / \ Pc (u-x)- Pc (u-y)\du. (37) 

The desired relation follows from relation (37) and Lemma 8. □ 



4.3 A distributed locally randomized SA scheme 

The locally randomized schemes presented in Section 4.2 facilitate the construction of a distributed locally 
randomized SA scheme. Consider the Cartesian stochastic variational inequality problem VI (X, F e ) given 
in (27) where the mapping F is not necessarily Lipschitz. In this section, we assume that the conditions of 
the MSR scheme are satisfied, i.e., for alH = 1, . . . , iV, the random vector zi is uniformly distributed over 
the set G B ni (0, e^) independently from the other random vectors zj for j ^ z, and the mapping F in (2) is 
defined over the set X e s . Let the sequence {xk} be given by 

Zk+i,i = (x k ,i - 7k,i^i(xk + z k , 6)) , (38) 

for all k > and z = 1,...,7V, where > denotes the stepsize of the z-th agent at iteration fe, 
Xk = (xk,i] x k,2'i •••'•> x k,N), and z^ = (^,1; ^,2; • • • 5 ^fe,jv)- The following proposition proves the almost-sure 
convergence of the iterates generated by algorithm (38) to the solution of the approximation VI(X, F e ). In 
this result, we proceed to show that the approximation does indeed satisfy the assumptions of Proposition 3 
and convergence can then be immediately claimed. We define T' k , the history of the method up to time /c, 
as 

Fk — {ffOj2oj£oj2ij£ij • • • j^fc-ijffc-i}? 

for k > 1 and T§ = {#o}. We assume that, at any iteration the vectors z^ and ^ in (38) are independent 
given the history T' k . 

Proposition 8 (Almost-sure convergence of locally randomized DAS A scheme). Let Assumptions la, 3, 
and 4 hold, and suppose that mapping F is strongly monotone on the set X e s with a constant n > 0. Also, 
assume that, for each i = 1, . . . , N, there exists a constant V{ > such that 

E[\\Qi(x k + z k , &) - F^xu + z k )\\ 2 I F' k ] < v} a.s. for all k. (39) 
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Then, the sequence {x k } generated by algorithm (38) converges almost surely to the unique solution of 
VI(X, F e ). 



Proof. Define random vector £' = (zi; Z2\ • . . ; z^\ £), allowing us to rewrite algorithm (38) as follows: 

X k +l,i = IIx, (X k ,i ~ Jk,i( F i( X k) + W k,i)) > 

w k,i ~ ®i( x k + Zk,€k) ~ F i(%k)- 



(40) 



To prove convergence of the iterates produced by (40) , it suffices to show that the conditions of Proposition 3 
are satisfied for the set X, the mapping F e , and the stochastic errors w k i . 

(i) Since Assumption 4 holds, Proposition 6b implies that the mapping F e is Lipschitz over the set X with 
the constant x/iV||C|| maxi<j<jv{ftj ^~zwi j:}- Thus, Assumption lb holds for the mapping F e . 

(ii) Next, we show that the mapping F e is strongly monotone over X. Since the mapping F is strongly 
monotone over the set X\ with a constant n > 0, for any u, v £ we have 

(u - v) T (F(u) - F(v)) > rj\\u - v\\ 2 . 

Therefore, for any x, y £ X and any realization of the random vector z, the vectors x + z and 2/ + z belong to 
the set X e s . Consequently, by defining u = x + z and v = y + z, respectively, and nothing that u — v = x — y, 
from the previous relation we obtain 

(a; - y) T (F(x + z) - F(y + z)) > rj\\x - yf. 

Taking expectations on both sides, it follows that 

(x - yf (E[F(x + z)\ - E[F(y + z)\) > r,\\x - y\\\ 

which implies that F e is strongly monotone over the set X with the constant n. 

(hi) The last step of the proof entails showing that the stochastic errors w' k = (w ky i; w k ^\ • • • ; w k ,N) are well- 
defined, i.e., E[w' k I J 7 ^] = and that Assumption 2 holds with respect to the stochastic error w' k . Consider 
the definition of w k i in (40). Taking conditional expectations on both sides, we have for all i = 1, . . . , N 

EK, I K] = EzAMxk + z k ,£k)] - F?(x k ) = E[F z (x k + z k )} - Ff(x k ) = Ff(x k ) - F?(x k ) = 0, 

where the last equality is obtained using the definition of F e in (27). Consequently, it suffices to show that 
the condition of Assumption 2 holds. This may be expressed as follows: 



N 



N 



J2\\Mxk + z k ,Z k )-F*(x k )\\ 2 \T' k 



By adding and subtracting Fi(x k + z k ) we obtain 

N 



E[|K|| 2 |^] <2E,, C 



£ {WMxk + z k ,£ k ) - F x (x k + z k )\\ 2 + \\Fi(x k + z k ) - F z e (x k )\\ 2 ) \ T' k 



.2 = 1 



N 



=2 £ E [E (a* + z k ,£ k ) - Fi(x k ■ 



Zk) 



F k ,Z k ] I Jfc] 



2=1 



N 



2j2m F ^ + z k )\\ 2 -\\Fi(x k )\\ 2 )\T' k ] 



where the last term is obtained from the following relation: 

E[Fi(x k + z k ) T F e (x k ) I T' k ] = E[Fi(x k + z k ) T F*(x k ) \ x k ] = \\F'{x k )f. 
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Using the assumption on the errors given in (39), we further obtain 

N N 

E[|K|| 2 | T' k \ < 2^f +2j2^[m(x k + zk)\\ 2 - \\FHxk)\\ 2 ) | H] ■ (41) 

2 = 1 2 = 1 

Furthermore, we have 

N N 

^[(mfrk +^)ll 2 - ll^(^)ll 2 ) I H] < E E 0l^(^ + ^)H 2 I H] < C\ (42) 

2=1 2=1 

where we use the fact Xk + ^ £ -X"! an d the assumption that is uniformly bounded over the set X| 
(cf. Assumption 4). Relations (41)-(42) imply that the stochastic errors {w' k } satisfy Assumption 2. Thus, 
the conditions of Proposition 3 are satisfied for the set X, the mapping F e , and the stochastic errors w' k i 
and the convergence result follows. □ 

The distributed locally randomized SA scheme produces a solution that is an approximation to the true 
solution. A natural question is whether the sequence of approximations tends to the solution of VI(X, F) 
as e, the size of the support of the randomization, tends to zero. The following proposition resolves this 
question in the affirmative. 

Proposition 9. Let Assumption la hold, and suppose that mapping F is a continuous and strongly monotone 
over the set X|. Let x e and x* denote the solution of VI(X, F e ) and VI(X,F), respectively. Then x e — )► 
when e — » 0. 

Proof. As showed in the proof of Proposition 8, F e is also strongly monotone over the set X with constant 
n. Since set X is assumed to be closed and convex, the definition of X| implies that X e s is also closed and 
convex. Thus, the existence and uniqueness of the solution to VI(X, F), as well as VI(X, F e ), is guaranteed 
by Theorem 2.3.3 of [11]. 

Let e = (ei, €2, . . . , ejv) with > for all i be arbitrary, and let x e denote the solution to VI(X, F e ). Let 
x* be the solution to VI(X,F). Thus, since x e is the solution to VI(X,F e ), we have (x* - x e ) T F e (x e ) > 0. 
Similarly, since is the solution to VI(X, F), we have (x e — x*) T F(x*) > 0. Adding the preceding two 
inequalities, we obtain for any k > 0, 

(x*-x e ) T (F e (x e )-F(x*))>0. 
Adding and subtracting the term F e (x*), we have 

(x* - x e ) T (F e (x e ) - F e (x*)) + (x* - x e ) T (F e (x*) - F(x*)) > 0, 

implying that 

(x* - x e ) T (F e (x*) - F{x*)) > (x* - x e ) T (F e (x*) - F e (x 6 )) > rj\\x* - x e \\ 2 , 

where the last inequality follows by the strong monotonicity of the mapping F e . By invoking the Cauchy- 
Schwartz inequality, we obtain 

\\F^x*)-F(x*)\\>v\\x*-x% (43) 
Next, we show that lim e ^o F e (x*) = F(x*). By the definition of F e and Jensen's inequality, we have 

- F(x*)\\ = \\E[F(x* +z)- F(x*)} \\ < E[\\F(x* + z) - F(x*)\\] . (44) 
Then, the expectation on the right-hand side can be expressed as follows: 

dzi • • • dz 



E[||F(z* +z)- F(x*)\\] = I ... [ \\F(x* +z)- F(x*)\\ ( H^i) ) 

jR n l JR n N \i=l ) 

= [ ... f \\F(x* + z)-F(x*)\\(Up u (z i ))dz 1 ...dz N , 

JB ni (0,ei) JB nff (0,e N ) \ i=1 J 



(45) 
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where the second equality is a consequence of the definition of the random vector z. Let 5 > be an arbitrary 
fixed number. By the continuity of F over X|, there exists a 8' > 0, such that if \\(x* + z) — < 8' ', then 
+z) < 5. Therefore, for all e = (ei, e 2 , • • • , ejv) with ||e|| < we have \\z\\ < \\e\\ < 8' for 

z £ n£Li 5^(0, e^, which is equivalent to ||(x* + - x*\\ < 8'. Hence, + z) - F{x*)\\ < 8 for all 

z G n^i^ni(0 5 ^) with such that ||e|| < 8'. Thus, using (44) and (45), for any e = (ei,...,ejv) with 
||e|| < 8' , we have 

||F e (x*) - F(x*)\\ <sf ... j ( T\pu(zi) ) dz M . . . dz fc ,jv = & 

^ ni (0,e fe;1 ) JB nN (0,e k , N ) \ i=1 J 

Since J > was arbitrary, we conclude that lim e ^o \\Fk(x*) — F(x*)\\ = 0. Therefore, taking limits on both 
sides of inequality (43), we obtain lim e ^o ||#* — % €k 11=0- O 

Remark: Note that the results of Propostion 8 and Proposition 9 hold when the random vector z fits 
the conditions of the MCR scheme. 



5 Numerical results 

In this section, we report the results of our numerical experiments on two sets of test problems. Of these, the 
first is a stochastic bandwidth-sharing problem in communication networks (Sec. 5.1), while the second is a 
stochastic Nash-Cournot game (Sec. 5.2). In each instance, we compare the performance of the distributed 
adaptive stepsize SA scheme (DASA) given by (25)-(26) with that of SA schemes with harmonic stepsize 
sequences (HSA), where agents use the stepsize | at iteration k. More precisely, we consider three different 
values of the parameter 0, i.e., = 0.1,1, and 10. This diversity of choices allows us to observe the sensitivity 
of the HSA scheme to different settings of the parameters. In the context of Nash-Cournot games, we use the 
distributed locally randomized SA scheme described in Sec. 4.3 with the MSR and MCR techniques. In each 
instance, we conduct a sensitivity analysis where we consider 12 different parameter settings, categorized into 
4 sets. In each set, one parameter is changed while other parameters are maintained as fixed. We provide 
90% confidence intervals of the mean squared error for each of the 12 settings. Our experiments have been 
done using Matlab 7.12. 

5.1 A bandwidth- sharing problem in computer networks 

We consider a communication network where users compete for the bandwidth. Such a problem can be 
captured by an optimization framework (cf. [6]). Motivated by this model, we consider a network with 16 
nodes, 20 links and 5 users. Figure 3 shows the configuration of this network. Users have access to different 
routes as shown in Figure 3. For example, user 1 can access routes 1,2, and 3. Each user is characterized by 
a cost function. Additionally, there is a congestion cost function that depends on the aggregate flow. More 
specifically, the cost function user i with flow rate (bandwidth) X{ is defined by 

/i(zi,6) = - E &(r)log(l + Xi(r)), 

for i = 1, . . . , 5, where x = (x\ \ . . . ; #5) is the flow decision vector of the users, £ = (£1; . . . ; £5) is a random 
parameter corresponding to the different users, 1Z(i) = {1, 2, . . . , n^} is the set of routes assigned to the z-th 
user, Xi(r) and £i(r) are the r-th element of the decision vector X{ and the random vector £^, respectively. 
We assume that £j(r) is drawn from a uniform distribution for each i and r. More precisely, £i(l), £i(2), and 
£i(3) are i.i.d. and uniformly distributed in [1 — 0.1, 1+0.1], £2(1) and £2(2) are i.i.d. and uniformly distributed 
in [1.4 - 0.2, 1.4 + 0.2], £ 3 (1) and £ 4 (1) are i.i.d. and uniformly distributed in [0.8 - 0.05,0.8 + 0.05] and 
[1.6 — 0.2, 1.6 + 0.2], respectively, and £5(1) and £5(2) are i.i.d and uniformly distributed in [1.2 — 0.1, 1.2 + 0.1]. 
The links have limited capacities, which are given by 

b = (10; 15; 15; 20; 10; 10; 20; 30; 25; 15; 20; 15; 10; 10; 15; 15; 20; 20; 25; 40). 
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Route 4 Route 9 




Figure 3: The bandwidth-sharing problem - the network 



We may define the routing matrix A that describes the relation between set of routes 1Z = {1,2,. ..,9} and 
set of links C = {1,2,..., 20}. Assume that A\ r = 1 if route r G 1Z goes through link / G C and A\ r = 
otherwise. Using this matrix, the capacity constraints of the links can be described by Ax < b. 
We formulate this model as a stochastic optimization problem given by 



N 

minimize ^ ^[fi{xu &)] + c(x) 

i=l 

subject to Ax < b 
x > 0, 



(46) 



where c(x) is the network congestion cost. We consider this cost of the form c(x) = ||Ar|| 2 . Problem (46) is 
a convex optimization problem and the optimality conditions can be stated as a variational inequality given 
by V/(x*) T (x — x*) > 0, where f(x) = Y^iLi + c(x). Using our notation in Sec. 2.2, we have 



F(x) 4 Vf(x) 



6(1) 



6(2) 



l + xi (l)'^'' l + Xiin)''"' l + ar 6 (2) 



2A T Ax, 



where ^i(r,) = E[£j(rj)] for any i = 1, . . . , 5 and r, = 1, . . . , n,. We now show that the mapping F is Lipschitz 
and strongly monotone. Using the preceding relation, triangle inequality, and Cauchy-Schwartz inequality, 
for any x, y G X = {x G M^Ar < b, x > 0}, we have 



\\F(x)-F(y)\\ 
6(1) 



1 



1 



< 



6(1): 



1 + ^(1) l + yi(l) 



;---;6(2) 



l 



;--.;6(2)- 



1 + 15(2) 

x 6 (2) - y 5 (2) 



2/5(2) 
-2||A r 



(1 + Z!(l))(l + yi (l)) MW (1| ar 6 (2))(l + 2/ 5 (2)) 
Using nonnegativity constraints, from the preceding relation we obtain 

- F(y)\\ < maxtiMWx - y\\ + 2\\A T A\\ \\x - y\\ = (max^(n) + 2\\A T A 

i,ri V ^,r^ 



2A T A(x - y) 
x - y\\. 
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implying that F is Lipschitz with constant max^ r . £i(ri) + 2 1 1 ^4.-^^4. 1 1 . To show the monotonicity of F, we 
write 



(F(x)-F(y)) T (x-y) 

x^-yijl) - x 5 (2)-y 5 (2) 

' (1 + x!(l))(l + yi(l)) ' ' ' ' ' (1 + x 5 (2))(l + 2/ 5 (2)) 



6(1)- 



2A T A(x-y)) (x-y) 



(^i(r-) -2/;(r)) 2 



> 



(l + Xi(r))(l + j/i(r)) 
min i)ri fi(ri) 



+ 2(x-y) T (A T A)(x-y) 



(1 + max/ 6(Z)) 2 



(x- y y 



x-y\\ 2 + 2(x-y) T (A T A)(x-y) 
I n + 2A t a) (x-y). 



(1 + max/ 6(/)) 



Our choice of matrix A is such that A T A is positive definite. Thus, the preceding relation implies that F is 
strongly monotone with parameter 



V 



min i>r ^(ri) 



2\ min (A T A), 



(l + max/6(/)) 2 

where X m i n (A T A) is the minimum eigenvalue of the matrix A T A. 
5.1.1 Specification of parameters 

In this experiment, the optimal solution x* of the problem (46) is calculated by sample average approx- 
imation (SAA) method using the nonlinear programming solver knitro [5]. Our goal lies in comparing 
the performance of the DASA scheme given by (25)-(26) with that of SA schemes using harmonic stepsize 
sequences of the form 7& = §, referred to as HSA schemes. We consider three values for and observe the 
performance of HSA scheme in each case. To calculate the stepsize sequence in DASA scheme, other than rj 
and L obtained in the previous part, parameters c, r^, D, and v need to be evaluated. We assume that c = | 
and n is uniformly drawn from the interval [1,1 + for each user. We let the starting point of all SA 

schemes be zero, i.e., xq = 0. Thus, D = max xG x \\x\\. Since the routing matrix A has binary entries, from 
Ax < 6, one may conclude that y/~N max/ b(l) can be chosen as D. To calculate z/, for any k > we have 



EfH^/ell 2 | F k ] = E[\\®(x k ,€k) - F(x k )\\ 2 | F k ] 

'£ M (l)-6,i(l). .6,5(2) -6, 5 (2) 



= E 



= E 



1 



' N m 

EE 



= 1 r=l 



6c,i(0 ~6c,i( r ) 

l + x k ,i(r) 



1+^,5(2) 



Fk 



Tk 



N 



var(£ fe>i (r)) 



= 1 r= 
N m 

i=l r=l 



where the last inequality is obtained using x k ^(r) > 0. Thus, y Si=i Er=i var (6c,i( r )) * s a candidate for 
parameter v. On the other hand, v needs to satisfy v > ^2 from Theorem 1. Therefore, we set v as follows: 



\IS w(Wr)), t 



max • 
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5.1.2 Sensitivity analysis 

We solve the bandwidth-sharing problem for 12 different settings of parameters shown in Table 1. We 
consider 4 parameters in our model that scale the problem. Here, denotes the multiplier of the capacity 
vector 6, m c denotes the multiplier of the congestion cost function c(x), and and are two multipliers 
that parametrize the random variable £. More precisely, if i-th user in route r is uniformly distributed in 
[a — 6, a + 6], here we assume that it is uniformly distributed in [m^a — d^b, m^a + d^b]. S(i) denotes the i-th 
setting of parameters. For each of these 4 parameters, we consider 3 settings where one parameter changes 
and other parameters are fixed. This allows us to observe the sensitivity of the algorithms with respect 
to each of these parameters. The SA algorithms are terminated after 4000 iterates. To measure the error 
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Table 1: The bandwidth-sharing problem: Parameter settings 

of the schemes, we run each scheme 25 times and then compute the mean squared error (MSE) using the 
metric ^ Y^=i \\ x k ~ x *\\ 2 f° r an y k = 1, . . . , 4000, where i denotes the i-th sample. Table 2 shows the 90% 
confidence intervals (CIs) of the error for the DAS A and HSA schemes. 



| - | S(Q || DASA - 90% CI || HSA with 9 = 0.1- 90% CI || HSA with 9 = 1- 90% CI | HSA with 9 = 10 - 90% CI | 





i 

2 
3 


[2.97e-6,4.66e-6] 
[2.97e-6,4.66e-6] 
[1.15e-7,3.04e — 7] 


[1.52e-6,2.37e-6] 
[1.52e-6,2.37e-6] 
[2.12e-8,4.92e-8] 


[1.70e-6,2.97e-6] 
[1.70e-6,2.97e-6] 
[4.66e-8,1.17e-7] 


[1.33e — 5,1.81e — 5] 
[1.33e-5,1.81e-5] 
[8.07e-7,2.43e-6] 


m c 


4 
5 
6 


[4.39e-7,6.55e-7] 
[1.29e-6,1.97e-6] 
[3.44e-6,5.36e-6] 


[1.33e-6,1.80e-6] 
[9.00e-6,1.20e-5] 
[2.26e-4,2.53e-4] 


[4.71e-7,8.75e-7] 
[7.88e-7,1.36e-6] 
[1.25e-6,1.99e-6] 


[3.84e-6,5.38e-6] 
[5.61e-6,7.98e-6] 
[7.34e-6,1.12e-5] 


m £ 


7 

8 
9 


[4.29e-5,6.40e-5] 
[3.18e-5,4.83e-5] 
[1.83e-5,2.88e-5] 


[7.92e-5,1.49e — 4] 
[3.46e-5,6.07e-5] 
[6.12e-6,9.99e-6] 


[2.83e-5,4.75e-5] 
[1.97e-5,3.39e-5] 
[1.06e-5,1.85e-5] 


[1.84e-4,2.75e-4] 
[1.40e-4,1.99e— 4] 
[8.33e-5,1.13e-4] 




10 
11 
12 


[3.82e-4,5.91e-4] 
[9.81e-4,1.44e-3] 
[6.26e-3,8.44e-3] 


[2.86e+l,2.86e+l] 
[2.86e+l,2.86e+l] 
[2.85e+l,2.86e+l] 


[5.50e— l,5.70e-l] 
[5.45e— l,5.85e-l] 
[5.47e— l,6.44e-l] 


[7.23e-5,9.64e-5] 
[2.85e-4,3.80e-4] 
[1.77e-3,2.36e-3] 



Table 2: The bandwidth-sharing problem - 90% CIs for DASA and HSA schemes 



5.1.3 Results and insights 

We observe that DASA scheme performs favorably and is far more robust in comparison with the HSA 
schemes with different choice of 6. Importantly, in most of the settings, DASA stands close to the HSA 
scheme with the minimum MSE. Note that when 6 = 1 or = 10, the stepsize | is not within the interval 
(0, Q+pyt L 2 ] f° r small k and is not feasible in the sense of Prop. 4. Comparing the performance of each HSA 
scheme in different settings, we observe that HSA schemes are fairly sensitive to the choice of parameters. 
For example, HSA with = 0.1 performs very well in settings S(l), S(2), and S(3), while its performance 
deteriorates in settings S(10), S(ll), and S(12). A similar discussion holds for other two HSA schemes. A 
good instance of this argument is shown in Figure 4. For example, HSA scheme with = 10 performs poorly 
in settings S(l) and S(4), while it outperforms other schemes in setting S(ll). We also observe that changing 
m& from 1 to 0.1 does not affect the error. This is because the optimal solution x* remains feasible for a 
smaller vector B. On the other hand, the error decreases when we use = 0.01. Figure 5 presents the 
flow rates of the users in different routes for the setting S(4). One immediate observation is that the flow 
rates of HSA scheme with = 10 fluctuates noticeably in the beginning due to a very large stepsize. Figure 
6 provides an image of the 90% CIs for the setting 5(4). We used two formats to present the intervals. 
The left-hand side half of each plot shows the intervals with line segments, while the other half shows the 
lower and upper bound of the intervals continuously. The colorful points represent the 25 sample errors at 
corresponding iterations. We see that the DASA scheme and HSA scheme with = 1 have CIs with similar 
size and a smooth mean while the mean in HSA scheme with = 10 is nonsmooth and oscillates more as 
the algorithm proceeds. 
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— HSA with 6=0.1 

— HSA with 6=1 
-HSA with 6 =10 



(a) Setting S(l) (b) Setting S(4) (c) Setting S(ll) 

Figure 4: The bandwidth-sharing problem - MSE - DASA vs. HSA schemes 
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Figure 5: The bandwidth-sharing problem - flow rates for the setting S(4) 



5.2 A networked stochastic Nash-Cournot game 

Consider a networked Nash-Cournot game akin to that described in Example 1. Specifically, let firm z's 
generation and sales decisions at node j be given by g^ and s^-, respectively. Suppose the price function 



Pj is given by Pj(sj,aj,bj) 



where sj 



a > 1 and 



and bj are uniformly distributed 



random variables defined over the intervals [Ib^ubj] and [Ibputij], respectively. For purposes of simplicity, 
we assume that the generation cost is linear and is given by c^gij. We also impose a bound on sales decisions, 
as specified Sij < cap'- for all i and j. Note that sales decisions are always bounded by aggregate generation 




Iteration Iteration Iteration 

(a) DASA (b) HSA with = 1 (c) HSA with = 10 

Figure 6: The bandwidth-sharing problem - 90% CIs for the setting S(4) 



30 



capacity. The optimization model for the i-th firm is given by: 



minimize 



A I 



Sij (a,j 



subject to Xi = (si.;gi.) e X { 



A I 



)) 



(47) 



M 

^ < cap ij? 

i Sj-j ^ 0, 



As discussed in [15], when 1 < a < 3 and M < 



3<t-1 

-1 ' 



sij < cap^, 



j = l,...,M, 



M. 



> . 



the mapping F is strictly monotone and strong 



monotonicity can be induced using a regularized mapping, given that our interest lies in strongly monotone 
problems. On the other hand, when a > 1, it is difficult to check that mapping F has Lipschitzian property. 
This motivates us to employ the distributed locally randomized SA schemes introduced in Sec. 4.3. Now, 
using regularization and randomized schemes, we would like to solve the VI(X, F e + 77I), where 77 > is the 
regularization parameter and F e is defined by (27). As a consequence, this problem admits a unique solution 
denoted by x* e . 



5.2.1 SA algorithms 

In this experiment, we use four different SA schemes for solving VI(X, F e + 77I) described in Sec. 3.2 and 
Sec. 4: 



MSR-DASA scheme. In this scheme, we employ the algorithm (38) and assume that the random vector 
z is generated via the MSR scheme, i.e., Z{ is uniformly distributed on the set £? n . (0,e^) while the mapping 
F e is defined by (27). One immediate benefit of applying this scheme is that the Lipschitzian parameter 
can be estimated from Prop. 6b. Moreover, we assume that the stepsizes jk,i are given by (25)-(26). The 
multiplier Ti is randomly chosen for each firm within the prescribed range. The constant c is maintained at 
^. Parameters D and v need to be estimated, while the Lipschitzian parameter L is obtained by Prop. 6b, 
i.e., 

L = VN\\C\\ max L r . n "'\ -\. 

";=wv\ 3 (n 3 -1)!! e 3 j 

MSR-HSA schemes. Analogous to the MSR-DASA scheme, this scheme uses the distributed locally 
randomized SA algorithm (38) where for any i = 1, . . . , iV, the random vector Z{ is uniformly drawn from the 
ball B ni (0,€i) and mapping F e is defined by (27). The difference is that here we use the harmonic stepsize 
of the form | at k-th iteration for any firm, where > 0. 

MCR-DASA scheme. This scheme is similar to the MSR-DASA scheme with one key difference. We 
assume that random vector z is generated by the MCR scheme, i.e., for any i = 1, . . . , iV, random vector zi 
is uniformly drawn from the cube C n . (0,e^) independent from any Zj with j ^ i. The Lipschitz constant L 
required for calculating the stepsizes is given by Prop. 7b: 

L= V^\\c\\ 

min^i,...,^}" 



MCR-HSA schemes. This scheme uses the algorithm (38) with multi-cubic uniform random variable z. 
The stepsizes in this scheme are harmonic of the form |. 

To obtain the solution x* e , we use the HSA scheme with the stepsizes ^ using 20000 iterations. Note that 
in this experiment, when we use the DAS A scheme, we allow that the condition v > is violated and we 
replace it with v > D. The condition v > D keeps the adaptive stepsizes positive for any fc. Asa consequence 
of ignoring v > the adaptive stepsizes become larger and in the order of the harmonic stepsizes in our 
analysis. Note that by this change, the convergence of the DASA algorithm is still guaranteed, while the 
result of Theorem Id does not hold necessarily. 
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5.2.2 Sensitivity analysis 

We consider a Nash-Cournot game with 5 firms over a network with 3 nodes. We set a = 1.1, lb^ = 0.04, 
utfj = 0.05, and Ibj = 1 for any j and ub a = (1.5; 2; 2.5). Having these parameters fixed, our test problems 
are generated by changing other model's parameters. These parameters are as follows: the parameter of 
locally randomized schemes e, the regularization parameter 77, the starting point of the SA algorithm xq, 
and the multiplier M a for the random variable aj for any j. We also consider two different settings for 
cap^ and cap^-. Note that when cap^- = 1, the constraints sij < 3 are redundant and can be removed. In 
our analysis we assume that q = e is identical for all firms. Similar to the first experiment in Sec. 5.1.2, 
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Table 3: The stochastic Nash-Cournot game - settings of parameters 

we consider a set of test problems corresponding to each of these parameters. In each set, one parameter 
changes and takes 3 different values, while other parameters are fixed. Table 3 represents 12 test problems 
as described. Note that Pi, P2, and P3 are three different feasible starting points. More precisely, Pi = 0, 
P2 = 0.5 (cap 7 ; cap), and P3 = (cap 7 ; cap). Similar to the first experiment, the termination criteria is running 
the SA algorithms for 4000 iterates. We run each algorithm 25 times and then we obtain the MSE of the 
form YhLi 114 ~ x vJ\ 2 for any /c = 1, . . . ,4000. Table 4 and Table 5 show the 90% CIs of the error for 
the described schemes. 

5.2.3 Results and insights 

Table 4 presents the simulation results for the test problems using the MSR-DASA and MSR-HSA schemes. 
One observation is the effect of changing the parameter e on the error of the schemes is negligible. We only 
see a slight change in the error of MSR-HSA scheme with = 10. Comparing the order of the error, we notice 
that the MSR-DASA scheme is placed second among all schemes of the first set of the test problems. In the 
second set, by decreasing 77 the error of all the schemes, except for the MSR-HSA scheme with = 0.1, first 
decreases and then increases. This is not an odd observation since we used x* e instead of x* to measure the 
errors and x* e changes itself when 77 or e changes. In this set, the MSR-DASA scheme still has the second 
best errors among all schemes. The schemes are not much sensitive to the choice of xq and we observe that 
the second place is still reserved by the MSR-DASA scheme. Finally, in the last set, we see that increasing 
the factor M a , as we expect, increases the error in most of the schemes. The reason is that increasing the 
order of M a increases both mean and variance of the random variable a. Importantly, we observe that our 
MSR-DASA scheme remains very robust among the MSR-HSA scheme. Table 5 shows the error estimations 



I - I S(i) || DASA - 90% CI || HSA with 9 = 0.1- 90% CI || HSA with 6 = 1- 90% CI | HSA with 9 = 10 - 90% CI | 
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[1.38e-2,2.37e-2] 
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[8.43e-3,1.62e— 2] 
[7.43e-3,1.44e-2] 
[1.61e-2,2.39e-2] 


[5.28e-4,1.08e-3] 
[2.59e-4,5.76e-4] 
[5.06e-4,8.65e-4] 


XQ 


7 

8 
9 


[2.68e-6,3.48e-6] 
[2.68e-6,3.48e-6] 
[2.68e-6,3.48e-6] 


[4.37e-l,5.13e-l] 
[2.22e— 5,2.91e-5] 
[2.22e-5,2.91e-5] 


[1.37e-6,1.92e-6] 
[1.37e-6,1.92e-6] 
[1.37e-6,1.92e-6] 


[6.71e-6,9.21e-6] 
[6.71e-6,9.21e-6] 
[6.71e-6,9.21e-6] 


M a 


10 
11 
12 


[4.45e-3,9.25e-3] 
[8.85e-3,1.73e-2] 
[1.92e-2,3.91e-2] 


[5.79e-l,9.25e-l] 
[1.25e— 0,2.12e-0] 
[8.51e-l,2.31e-0] 


[1.67e-3,5.72e-3] 
[9.38e-4,1.82e— 2] 
[1.87e-3,4.15e-2] 


[2.72e-5,2.07e-2] 
[4.52e-3,3.22e-2] 
[1.04e-2,7.23e-2] 



Table 4: The stochastic Nash-Cournot game - 90% CIs for MSR-DASA and MSR-HSA schemes 

using the MCR-DASA and MCR-HSA schemes. Comparing these results with the MSR schemes in Table 4, 
we see that the sensitivity of the MCR schemes to the parameters is very similar to that of MSR schemes 
and the MCR-DASA scheme performs as the second best among all MCR schemes. We also see that in most 
of the settings, the error of the MSR-DASA scheme is slightly smaller than the error of the MCR-DASA 
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scheme. One reason can be that the MSR scheme has a smaller Lipschitz constant than the MCR scheme 
for our problem settings. 



| - | S(i) || DASA - 90% CI || HSA with 6 = 0.1- 90% CI || HSA with 6 = 1- 90% CI | HSA with 9 = 10 - 90% CI | 



e 


1 

2 
3 


[1.22e-2,2.55e-2] 
[1.21e-2,2.53e-2] 
[1.21e-2,2.53e-2] 


[1.84e+l,1.88e+l] 
[1.84e+l,1.88e+l] 
[1.84e+l,1.88e+l] 


[1.78e-l,2.29e-l] 
[1.78e-l,2.28e— 1] 
[1.78e-l,2.28e-l] 


[2.42e-3,4.21e-3] 
[2.37e-3,4.13e-3] 
[2.37e-3,4.13e-3] 


V 


4 

5 
6 


[4.17e-3,9.50e-3] 
[1.41e-3,4.06e-3] 
[8.19e-3,1.88e-2] 


[1.65e — 0,1.74e-0] 
[1.85e-0,1.93e-0] 
[2.37e-0,2.46e-0] 


[9.37e-3,1.84e-2] 
[6.88e-3,1.32e-2] 
[1.85e-2,3.10e — 2] 


[7.38e-4,1.73e-3] 
[2.85e-4,5.06e-4] 
[4.18e — 4,7.05e-4] 


XQ 


7 

8 
9 


[2.25e-5,2.88e-5] 
[2.25e-5,2.88e-5] 
[2.25e-5,2.88e-5] 


[4.31e-l,5.12e-l] 
[1.13e— 4,1.58e — 4] 
[1.13e — 4,1.58e — 4] 


[9.41e-6,1.18e — 5] 
[9.40e-6,1.18e-6] 
[9.40e-6,1.18e — 5] 


[3.99e-5,5.27e-5] 
[3.99e-5,5.27e-5] 
[3.99e-5,5.27e-5] 


M a 


10 
11 
12 


[1.66e-3,4.29e-3] 
[3.03e-3,1.22e-2] 
[6.05e-3,2.60e-2] 


[6.17e-l,8.88e-l] 
[1.29e-0,2.23e-0] 
[8.50e-l,2.49e-0] 


[4.21e-4,1.79e-3] 
[9.63e-4,5.77e-3] 
[2.27e-3,1.29e— 2] 


[3.82e-4,8.30e-3] 
[2.48e-3,2.52e-2] 
[5.54e-3,5.67e-2] 



Table 5: The stochastic Nash-Cournot game - 90% CIs for MCR-DASA and MCR-HSA schemes 
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Figure 7: The stochastic Nash-Cournot game - comparison among all the schemes 

Figure 7 illustrates a comparison among the different schemes described in Sec. 5.2.1 for the case of setting 
S(5) and S(8). All the MSR schemes are shown with solid lines, while the MCR schemes are presented with 
dashed lines. There are some immediate observations here. Regarding the order of the error, in both if 
the settings S(5) and S(8), the schemes with the distributed adaptive stepsizes given by (25)- (26) are the 
second best scheme among each of MSR and MCR schemes. This indicates the robustness of the DASA 
scheme compared with the HSA schemes. We also observe that in the setting S(5), the HSA schemes with 
= 10 (both MSR and MCR) have the minimum error, while in setting S(8), the HSA schemes with = 1 
has the minimum error. This is an illustration of sensitivity of HSA schemes to the setting of problem 
parameters. Let us now compare the MSR schemes with the MCR schemes. In the setting S(5), the MSR 
and MCR schemes perform very closely and in fact, it is hard to distinguish the difference between their 
errors. On the other hand, in the setting S(8), we see that the MSR schemes have a better performance than 
their MCR counterparts. Figure 8 illustrates the 90% confindence intervals for the MSR schemes with the 
setting S(5). Teese intervals are shown with line segments in the left-hand side half of each plot and shown 
with continious bouns in the right-hand side half. The colourful points present the samples at each level of 
iterates. Impostantly, we observe that the CIs of MSR-DASA scheme are as tight as the MSR-HSA scheme 
with = 1 and they are tighter than the ones in the MSR-HSA scheme with = 10. Figure 9 shows the 
similar comparison for the MCR schemes. 

6 Concluding remarks 

We consider the solution of strongly monotone Cartesian stochastic variational inequality problems through 
stochastic approximation (SA) schemes. Motivated by the naive stepsize rules employed in most SA im- 
plementations, we develop a recursive rule that adapts to problem parameters such as the Lipschitz and 
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Iteration Iteration Iteration 



(a) MSR-DASA scheme (b) MSR-HSA scheme with 6 = 1 (c) MSR-HSA scheme with = 10 

Figure 8: The stochastic Nash-Cournot game - setting S(5) - MSR-DASA vs. MSR-HSA schemes 
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Figure 9: The stochastic Nash-Cournot game - setting S(5) - MCR-DASA vs. MCR-HSA schemes 



monotonicity constants of the map and ensures almost-sure convergence of the iterates to the unique so- 
lution. An extension to the distributed multi-agent regime is provided. A shortcoming of this approach 
is the reliance on the availability of a Lipschitz constant. This motivates the construction of two locally 
randomized techniques to cope with instances where the mapping is either not Lipschitz or estimating the 
parameter is challenging. In each of these techniques, we show that an approximation of the original mapping 
is Lipschitz continuous with a prescribed constant. We utilize these techniques in developing a distributed 
locally randomized adaptive steplength SA scheme where we perturb the mapping at each iteration by a 
uniform random variable over a prescribed distribution. It is shown that this scheme produces iterates that 
converge to a solution of an approximate problem, and the sequence of approximate solutions converge to the 
unique solution of the original stochastic variational problem. In Sec. 5, we apply our schemes on two sets 
of problems, a bandwidth-sharing problem in communication networks and a networked stochastic Nash- 
Cournot game. Through these examples, we observed that the adaptive distributed stepsize scheme displays 
far more robustness than the standard implementations that leverage harmonic stepsizes of the form | in 
both problems. Furthermore, the randomized smoothing techniques assume utility in the Cournot regime 
where Lipschitz constants cannot be easily derived. 
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